A formulation of redblack trees intended to aid in understanding
and simplify implementation

The redblack tree is a data structure that supports logtime insert, delete,
and find operations. It is widely used; for example it is the basis
for most implementation of C++ standard library sets, maps, and other
containers.

The intention of this note is to present a mild reformulation of
redblack trees that simplifies the algorithms and presentation.
The hope is that this will both aid in understanding redblack trees
and simplify implementations.

Definition of redblack trees
A redblack tree is a binary tree in which each node contains a data field.
(The data fields may be integers, strings, or any data structures that support
a notion of comparison and ordering.) Data fields in a node's left subtree
are smaller than the node's data field, and data fields in the node's
right subtree are larger than the node's data field.

Each node in a redblack tree is marked with a color (red or black).

A redblack tree is wellformed if it satisfies the following two properties:

Red property:
Red nodes do not have red parents.

Black property*:
Siblings have the same maximum black height.

The maximum black height of a node is the maximum number of black nodes
on any simple path starting at the node and descending into its subtree.
(Empty subtrees have a black height of zero.)

For those already familiar with redblack trees and their usual definition
of black height, let me offer a couple of comments on the rationale for this
new definition:

Unlike the notion of black height, maximum black height is welldefined
for all subtrees.

Moreover, as will be seen below, during "delete"
operations there will be temporary violations of the black property.
There is a node in the tree whose maximum black height is less than that
of its sibling. However, the parent of those two siblings retains its
original maximum black height because of the sibling's black height.
So, it becomes possible to identify the single and only node in the tree
that violates the black property as the delete operation progresses.

This creates a nice symmetry with violations of the red property during
the insert operation. As the insert operation progresses, it is simple
to identify the single node in the tree that violates the red property,
because it is red has a red parent.

In what follows
we will sometimes use "weight" as a synonym for "maximum black height".
Where it is known that all simple paths from the root of a subtree to the
boundary nodes in the subtree have the same number of black nodes, we will
at times refer simply to the "black height" of the tree. In that case, the
standard notion of "black height" is defined for the subtree, and is the
same as the "maximum black height" of the subtree.

A complete path is a simple path from the root to a boundary node.

Redblack trees stay balanced to within a factor of two
An important consequence of the Black property is that all complete paths
in a wellformed redblack tree have the same number of black nodes.
(In fact, this is the definition usually given for the Black property; see
Afterword for the rationale for the new definition.)

This will allow us to bound the ratio of longest to shortest complete
paths, guaranteeing that redblack trees do not get too out of balance.
This crucial guarantee will allow us to conclude that all find, insert,
and delete operations can be completed in time O(log(N)).

Those familiar with red black trees will note that we use a nonstandard definition
of the black property, which will be refered to herein as the "Local black property"
when it is necessary to contrast it with the standard definition.

The emphasis of this site is understanding of the algorithms
and their correctness rather than their asymptotic complexity properties. The
reformulation we give here provides a nice local character to the black property, and
constitutes an intuitive precondition for the blackproperty repair algorithm.

We refer to the standard black property as the "global black property":

Global black property:
Every simple path from the root to a boundary node has
the same number of black nodes.

Use of the local black property comes at the price of incurring a proof
obligation. We need to show the equivalence of
the local black property and the global
black property in order to guarantee the desired O(logN) asymptotic behavior of
the insert, delete, and find operations on red black trees that are defined to
obey the local black property.

Lemma: If a redblack tree satisfies the local black property, then every subtree
satisfies the global black property.

The proof will be a simple induction on maximum black height. For the base case, note that
the global black property trivially holds for subtrees of maximum black height zero.

Now, inductively assume that for every subtree of maximum black height K,
all complete paths have K black nodes. Consider a subtree of maximum black
height K+1, and let's initially consider the case where the root is black.
One of the node's children must have a maximum black height of K. But by the Black
property both children have the same maximum black height, so they both must have
maximum black height K. From the inductive assumption we know that all complete
paths in these child subtrees have K nodes. A complete path in the
original subtree is a concatenation of the root node and a complete
path of one of its child subtrees. Since complete paths of the child
subtrees all have length K, the concatenated complete paths from the
root must all have length K+1.

Now if the original subtree has a red root node, its two children have
the same maximum black height by the Black property, which must be K+1 (since
the presence of the red node will not contribute to the subtree's
maximum black height.) By the previous result, all complete paths in these
two subtrees have K+1 black nodes. Complete paths in the original
subtree are concatenations of a red node (the root), and subtree
complete paths of maximum black height K+1. So again, all complete paths
have maximum black height K+1.
QED

Implications of the red and black properties for binary trees
From the Red property, we can see that a redblack tree
might have all black nodes. From the lemma just proved,
we see that all complete paths would have the same number of nodes.
That is to say, the tree would be a complete binary tree with
2^N  1 nodes.

Alternatively, a redblack tree might have a complete path that has some
red nodes and some black nodes. The longest possible path would start
and end with red nodes, and the nodes would alternate between red and black.

So, it is possible that in a redblack tree some complete paths might
be almost twice as long as others, but this is the worst it can get.
So, the height of the tree is bounded by 2*log(N). This bound on how out
of balance the tree can get makes it possible to guarantee that search
operations will all take time O(2*log(N)), which equals O(log(N)).

Importantly, as will be seen below, insert and delete operations can
also be accomplished in O(log(N)) time.

Terminology for nodes
We will need terminology to describe several relationships between nearby
nodes in a tree. Figure 1 gives the terminology we will use.

Figure 1: Here are the names of tree positions relative to NODE that
will be used in this note.

grandparent

+++
 
parent uncle

+++
 
NODE sibling
 
+++ +++
   
outside_child inside_child near_nephew far_nephew
The rotateUp(childNode) tree modification operation
The fundamental problem we are trying to solve is that a sequence of insert
and delete operations can make a binary tree badly out of balance.
(In the extreme, a binary tree can degenerate to becoming a long, narrow singly linked list.)
This makes binary search operations expensive, ruining the advantages of binary search.

We need an operation that can rebalance a binary tree.

Rebalancing of redblack trees is done via an important operation
called rotation. Rotation does a local rearrangement of a tree. It
has the useful properties that it does not change the inorder traversal
of the tree, and it does not introduce red/black violations into the tree.
We will formally define the operation "rotateUp()" below, but first, here is an
illustration of its operation:

Figure 2: rotateUp(node A) maintains the order of the nodes in the tree.

B > A <
 
+++ ===> +++
   
> A < greater than B less than A B
 
+++ +++
   
less than A between A and B between A and B greater than B
Definition: The "rotateUp()" operation can be applied to a nonroot
red node A that does not have a red sibling. Node A replaces its parent;
the inside child of A is changed to be the inside child of what was the
parent node, and the former parent node replaces the removed inside child of A.
Finally, the colors of node A and the former parent node are swapped.

Figure 3. rotateUp(NODE) takes a red node that does not have a red sibling
as its argument. The red and black properties of the tree are preserved
by the transformation.

parent (black) NODE (black)
 
+++ ===> +++
   
NODE (red) T3 (not red) T1 (not red) parent (red)
 
+++ +++
   
T1 (not red) T2 (not red) T2 (not red) T3 (not red)
Several things are worth noting about this operation. (We leave the proofs
as exercises.)

The modified tree is still a wellformed redblack tree.

The inorder traversal of the tree is not changed.

The operation can be used to revert its own tree modifications; the
rotateUp() operation can be applied to the new red node, and the
original tree is restored.

The NODE and its parent switch colors, so that the root of the new
subtree is still black, preventing any potential redblack tree property
violations between the new subtree and nodes outside of the subtree.

No maximum black heights are changed in the new tree.

The outside subtrees T1 and T3 keep their respective parents, but those
parents change colors.

The middle subtree T2 changes parents, but the color of its parent does
not change.

The presence of the red node requires its parent and children (if they
exist) to be black, because of the Red property of wellformed redblack
trees.

What happens when we violate the precondition of rotateUp()?
We required that rotateUp() only be applied to a red node that does not
have a red sibling. It is interesting, though, to take a moment and think
about what happens if we break this rule in different ways.

If rotateUp() is applied to a red node that has a red sibling, the resulting
tree ends up with a red property violation. (This is easy to check.) But an
interesting implication of this follows from the fact that doing this operation and then undoing
it is the identity. So, the undoing, which is also a rotateUp(), can take a
tree with a redproperty violation and fix it. As we will see, this observation
will help in the insertion algorithm below, which requires fixing a tree that has
a redproperty violation. (See Figures 4 and 5 below.)

If rotateUp() is applied to a black node, it breaks the black property in two
places. First, the sibling node has its maximum black height increased by one.
Second, the node's outside child has its maximum black height decreased by one.
(Figures 6 and 7 below illustrate this operation.)

Also, if the black node has a red outside child and a red parent, a red
property violation will occur.

As we will see, this application of rotateUp() which changes the maximum black
heights of nodes will help with the deletion algorithm below.
Deletion causes sibling subtrees to have different maximum black heights. A
tree manipulation that changes the maximum black heights of nodes will prove
useful in repairing this damage.

As we have suggested,
the rotateUp() operation can take a tree that violates one of the
redblack tree properties, and change it into a wellformed redblack
tree.

For example, here is a tree that violates the red property, because
node and PARENT are both red. (In this and the following examples, we
capitalize the node on which the rotateUp() operation will be performed
for emphasis.)

Figure 4. This tree violates the Red property, because node and PARENT
are both red.

grandparent (black)

+++
 
PARENT (red) uncle (black)

+++
 
node (red) sibling (black)

+++
 
T1 (black) T2 (black)
Applying rotateUp(PARENT), node's parent has its color changed as
noted above, and because the subtree was a wellformed redblack tree
otherwise, the removal of the redproperty violation leaves the tree
wellformed:

Figure 5. rotateUp(PARENT) changes node's parent from red to black,
fixing the tree.

PARENT (black)

+++
 
node (red) grandparent (red)
 
+++ +++
   
T1 (black) T2 (black) sibling (black) uncle (black)
As will be seen below, the above transformation is the crucial one for
inserting new nodes into redblack trees.

As another example, consider a tree that violates the black property.
The light_node has a maximum black height of one, and SIBLING has a maximum black height of 2.

Figure 6. This tree violates the Black property at light_node.

parent (red)

+++
 
light_node (black) SIBLING (black)

+++
 
near_nephew (black) far_nephew (red)

+++
 
T1 (black) T2 (black)
If we apply "rotateUp(SIBLING)", and change the color of far_nephew,
we get the following tree:

Figure 7. Doing a rotateUp(SIBLING) and then changing the color of
far_nephew fixes the tree.

SIBLING (red)

+++
 
parent (black) far_nephew (black)
 
+++ +++
   
light_node (black) near_nephew (black) T1 (black) T2 (black)
All pairs of sibling subtrees now have the same maximum black height.

As will be seen below, this is the crucial operation
required to repair a tree after a node has been deleted.

With the above groundwork in place, let's look at some code!

Inserting a new node into a redblack tree
To insert a new node, we first insert it into the tree in the correct
position based on the ordering of the data contained in the nodes. The
new node will end up being attached as a leaf node to a boundary node.

We color the new node red. This assures that its presence will not
cause a violation of the Black property. However, if the new node's
parent happens to be red, we will have introduced a violation of the Red property.

As will be seen below, if the new node does introduce a violation of
the red property, we can repair the tree with a single traversal from the
new node toward the root of the tree. Since the tree height is O(log(N)), this
operation will be completed in O(log(N)) time.

One quick implementation detail before looking at the code: We use a
functional notation

isRed(Parent(node))
instead of pointer dereferencing. All functions can take NULL as an
argument and return NULL or false if applied to NULL. The details
of the various helper functions are left to the reader.

Here is the insert algorithm and its associated "restoreRedProperty" algorithm.

void redBlackInsert(RedBlackTree *tree, void *dataValue)
{
RedBlackTree *newNode = treeInsert(tree, dataValue);
if (violatesRedProperty(newNode))
restoreRedProperty(tree, newNode);
}
// Node is red and its parent exists and is red, violating the Red property.
// No Blackproperty violations or other Redproperty violations exist in the tree.
//
// The algorithm gives node a black parent. The rearrangement may cause a
// Red property violation higher in the tree, which is solved by a recursive call.
static void restoreRedProperty(RedBlackTree *tree, RedBlackNode *node)
{
if (isRootNode(parent(node))) { // case 1; all done.
parent(node)>color = BLACK;
} else if (isRedNode(uncle(node))) { // case 2.
parent(node)>color = BLACK;
uncle(node)>color = BLACK;
grandparent(node)>color = RED;
if (violatesRedProperty(grandparent(node)))
restoreRedProperty(tree, grandparent(node)); // possibly recurse..
// else all done.
} else { // case 3; all done after
// ensure that the node is an outside child.. // local changes below.
if (isInsideChild(node)) {
rotateUp(tree, node); // setup for final move;
node = outsideChild(node);
}
rotateUp(tree, parent(node)); // final move to fix the tree;
}
}
An initial easy case is when the parent of the node violating the
Red property is the root. By assumption there are no violations of the Black
property, so NODE and its sibling (if it exists) have the same maximum black height. So,
we can simply make the root black and the tree is then wellformed.

Figure 8. Red property violation occurs at a child of the root.
Since there are no Black property violations, the tree
can be repaired by making the root black.

root (red)

+++
 
NODE (red) optional sibling


v
root (black)

+++
 
NODE (red) optional sibling
The second case is when both NODE's parent and uncle are red nodes.
This is the only case in which recursion may be necessary.

Figure 9. NODE has red parent and uncle. Tree can be
locally repaired by making parent and uncle black, making grandparent
red, and if required calling restoreRedProperty() on grandparent.

grandparent (black)

+++
 
parent (red) uncle (red)

+++
 
NODE (red) optional sibling


v
grandparent (red)

+++
 
parent (black) uncle (black)

+++
 
NODE (red) optional sibling
The third case, in which NODE's uncle is black, permits completion of the algorithm with
no need for recursion. The final move only works if NODE is an outside child, so it may
be necessary to set the table by doing a preliminary move that makes
the site of the Red property violation an outside child. Figure 10 illustrates this.

Figure 10. Modify the tree so that the node containing the Red property violation
is an outside child.

grandparent (black)

+++
 
parent (red) uncle (black)

++
 
sibling (not red) NODE (red)

+++
 
T1 (not red) T2 (not red)


v
grandparent (black)

+++
 
NODE (red) uncle (black)

+++
 
parent (red) T2 (not red)

+++
 
sibling (not red) T1 (not red)
With the guarantee that the uncle node is black, 'rotateUp(parent)' completes the algorithm and
the tree satisfies both the Red property and the Black property.

Figure 11. If NODE is an outside child, a rotateUp() on parent completes the algorithm.

grandparent (black)

+++
 
PARENT (red) uncle (black)

+++
 
node (red) sibling (black)


v
PARENT (black)

+++
 
node (red) grandparent (red)

+++
 
sibling (black) uncle (black)
Deleting a node from a redblack tree
We now turn to node deletion. As with node insertion, we start
the process at a boundary node. We find the node to be deleted in the
tree, and if it is an interior node then the data in its successor is
exchanged with the data in that node. The successor of an interior node
is guaranteed to be a boundary node, so in any case we have the task of
deleting a boundary node.

If the node being deleted is black, it will create a violation of the
Black property (unless it happens to be the root). The deletee might
have a (single) child node. If the deletee does not have a child, or
if the child is black, then we call the function restoreBlackProperty()
to fix the Black property violation.

It is convenient to start the recursive "restoreBlackProperty(deleteMe)"
function at the node being deleted, so we leave deleteMe in the tree
while the tree is being rebalanced. We give deleteMe node a new color,
WHITE, to create the initial violation of the Black property.

After restoreBlackProperty() is complete, we trim the node being deleted
out of the tree. If the deletee has a child node, it is replaced by its
child; if not it is replaced by the empty subtree. restoreBlackPropery()
happens always to leave the node being deleted with a black parent,
so if the deletee has a child node and it is red, there will be no
violation of the Red property after the deletee is replaced by its child.

void redBlackDelete(RedBlackTree *tree, void *dataValue)
{
RedBlackNode *deleteMe = redBlackFind(tree, dataValue);
if (isInternalNode(deleteMe)) {
swapData(deleteMe, successor(deleteMe));
deleteMe = successor(deleteMe);
}
if (!isRootNode(deleteMe) && isBlackNode(deleteMe)) {
deleteMe>color = WHITE;
restoreBlackProperty(tree, deleteMe);
}
replaceChild(tree, parent(deleteMe), deleteMe, childOrEmptySubtree(deleteMe));
}
The essential problem that restoreBlackProperty() needs to solve is that
there are two sibling subtrees with different maximum black heights. The lighter
sibling is the one that contains the node being deleted.

The algorithm changes the tree to rebalance the siblings so that they
end up having equal maximum black heights. One way to do that is to "kick the problem
upstairs", and make the heavier sibling lighter. Then, the siblings
will be equal, but the parent may be lighter than its sibling.
In that case, the routine calls itself recursively on the parent and
works its way up the tree.

There are a few ways the algorithm can avoid recursion and finish the
job by performing local manipulations. The most nontrivial of these
is to use the rewiring shown in Figure 15 below. The other manipulations are
simply intermediate steps to ensure that the conditions required for
that final rewiring are satisfied.

A useful way to look at the algorithm is to start from Figure 15, the
final step, and work backwards. Basically, Figure 15 represents the
checkmate move, and the other moves are simply ways to rearrange things
so that the coup de grace can be executed.

// lightNode's maximum black height is one less than its sibling's maximum black height, violating the Black property.
// No Redproperty violations or other Blackproperty violations exist in the tree.
//
// NODE is not red.
//
// restoreBlackProperty() makes lightNode and its sibling equal in weight. This rearrangement may
// cause a Black property violation higher in the tree, which is solved by a recursive call.
static void restoreBlackProperty(RedBlackTree *tree, RedBlackNode *lightNode)
{
// ensure that sibling is black.
if (isRedNode(sibling(lightNode))) // initial preparation; see Figure 12.
rotateUp(tree, sibling(lightNode));
if ( !isRedNode(nearNephew(lightNode)) // case 1; see Figure 13
&& !isRedNode(farNephew(lightNode)))
{
sibling(lightNode)>color = RED;
if (isRedNode(parent(lightNode))) // case 1a; all done..
parent(lightNode)>color = BLACK;
else if (!isRootNode(parent(lightNode))) { // case 1b;
restoreBlackProperty(tree, parent(lightNode)); // recurse toward root.
}
} else { // case 2; all done after local changes.
// ensure far nephew is red..
if (!isRedNode(farNephew(lightNode)))
rotateUp(tree, nearNephew(lightNode)); // setup for final move; see Figure 14.
farNephew(lightNode)>color = BLACK;
rotateUp(tree, sibling(lightNode)); // all done; see Figure 15.
}
}
As an initial step, restoreBlackProperty() makes sure that the light
node has a black sibling. This is an initialization step that is needed
for both branches of the main "if" statement in restoreBlackProperty().
The tree modification is illustrated in Figure 12.

The fact that SIBLING is heavier than light_node guarantees that both subtrees
of SIBLING are nonempty, and they must both have black root nodes since by
assumption the tree has no Red property violations.

Figure 12. restoreBlackProperty() needs to make sure that light node's
sibling is black. rotateUp(sibling) makes this true.

parent (black)

+++
 
light_node (black) SIBLING (red)

+++
 
T1 (black) T2 (black)


v
SIBLING (black)

+++
 
parent (red) T2 (black)

+++
 
light_node (black) T1 (black)
If sibling (which we now know to be black) has no red children, we can
make sibling and node have the same weight by making sibling lighter.
This is done by simply changing sibling from black to red.

We then have the simple problem of accommodating the change in sibling's
color. If parent was originally red, it can be made black and the
algorithm is complete. If parent was originally black, its maximum black height
is now smaller and it becomes the new light node. If the parent is not
the root it is the new site of a Black property violation, which is fixed
by a recursive call to restoreBlackProperty().

Figure 13. If nephews are both NIL or black, we can restore balance by
making sibling red. If parent was red, we are done. If parent was black,
recursively call restoreBlackProperty().

parent (red) parent (black)
 ===> 
+++ +++
   
light_node (black) sibling (black) light_node (black) sibling (red)
 
+++ +++
   
(not red) (not red) (not red) (not red)
parent (black) new light_node (black); recurse here.
 ===> 
+++ +++
   
light_node (black) sibling (black) light_node (black) sibling (red)
 
+++ +++
   
(not red) (not red) (not red) (not red)
The other branch of the main conditional in restoreBlackProperty() is
that at least one of the nephew nodes is red. If it happens that the
nephew farthest from node is red, the algorithm can perform its ultimate
step and complete its job, as shown in Figure 15.

If far_nephew is not red, we need to change it to be red, as a setup
for the algorithm's ultimate step. It is a simple matter to make it so
using a rotateUp on near_nephew.

Figure 14. If far_nephew is not red (needed for the final step),
rotateUp(near_nephew) will make it so.

parent (either)

+++
 
light_node (black) sibling (black)

+++
 
NEAR_NEPHEW (red) far_nephew (black)

+++
 
T1 (black) T2 (black)


v
parent (either)

+++
 
light_node (black) NEAR_NEPHEW (black)

+++
 
T1 (black) sibling (red)

+++
 
T2 (black) far_nephew (black)
The final step completes the algorithm. It restores the Black property
locally, and does so in a way that does not change the maximum black height of the
parent.

Figure 15. If far_nephew is red, then rotateUp(sibling) and a change
of far_nephew to black completes the algorithm.

parent (either)

+++
 
light_node (black) SIBLING (black)

+++
 
near_nephew (either) far_nephew (red)


v
SIBLING (either)

+++
 
parent (black) far_nephew (black)

+++
 
light_node (black) near_nephew (either)
Afterword
The above presentation was meant to be tutorial, and did not refer to
the differences between the standard presentations of redblack trees
and this one. In this afterword we mention some of those differences.

We make use of a generalization of the usual notion of maximum black height.
In the standard definition, the maximum black height of a node is only considered
well defined if all simple paths from the node to leaf nodes have the
same number of black nodes. We prefer to define the maximum black height of
a node as the maximum number of black nodes in any simple path staring
from the node and descending into its subtree. With that definition,
maximum black height is welldefined for all nodes in all trees, even ones that
violate the black property.

We give an alternative formulation of the Black property in terms
of sibling nodes. This fits well with the preconditions of the
restoreBlackProperty() algorithm. The exact location of a violation
of the Black property becomes welldefined. If a node has
children with different maximum black heights, the node's maximum black height is defined in
terms of the larger child maximum black height. This fits well with the deletion
algorithm, because all nodes outside the subtree that is the lighter of
the unequal sibling pair retain their original maximum black heights. However,
this new definition creates the need for a proof that the standard black
property is equivalent. The latter is of course essential to ensure that
redblack trees have the desired O(log(N)) algorithms.

We introduce no notion of artificial NIL leaf nodes. We make no
assumption in the definitions or the code that all nonleaf nodes have
two children. We use the notion of boundary nodes instead of leaf nodes.

We make no requirement that the root node be red.

We define a single rotateUp() operation that applies to a child
node instead of defining a pair of left and right rotate() operations
that apply to a parent node.

We use directionagnostic naming of relative node positions. Together
with rotateUp(), this allows us to avoid mentions of left or right
anywhere in the algorithms except in a couple of lowlevel utility
routines.

We define restoreRedProperty() and restoreBlackProperty() recursively
instead of iteratively. The entry conditions of these routines are
carefully stated and serve the same function a loop invariants in the
iterative routines.

In the delete operation we leave the node to be deleted in the tree while
the Black property is being restored. It is helpful in the algorithm to
start at that node and work upward in the tree. To create the initial
violation of the Black property, we assign this node a third color.

Please let me know if you see any errors or problems in this presentation. Thanks!

gregfjohnson at a big company that provides email and whose name starts with y dot com.
