Red-Black tree

A Red-Black tree is a self-balancing binary search tree. The main difference to a normal binary tree is an extra bit that stores the parity (the color) of each node. This bit can be either be 0 (black) or 1 (red). An example of a red-black tree is shown in Figure 1 (Hardcoded)

rb-tree.png

The following invariant must be satisfied every time an operation is being made on node of a Red-Black tree:

  1. Every node is either red or black
  2. The root is always black
  3. Every leaf is black
  4. If a node is red, then both its children are black
  5. All paths from a node to descendant nodes contain the same number of black nodes

The node of a Red-Black tree can be described by the following data type:

In [8]:
from enum import Enum

class RBNode:
    
    def __init__(self, color, key, left, right, parent):
        self.left = left
        self.right = right
        self.parent = parent
        self.key = key
        self.color = color

    def __str__(self):
        return str("(%s)%s" % (self.color, self.key))
    
class NodeColors(Enum):
    BLACK = 0;
    RED = 1;

The data structure of the tree is nothing more than a collection of nodes with the root node as the starting point. The constructor initializes a root node to a provided value

In [9]:
class RBTree:

    def __init__(self, key):
        self.nil = RBNode(NodeColors.BLACK, None, None, None, None)
        self.root = RBNode(NodeColors.BLACK, key, self.nil, self.nil, self.nil)

Your task for this assignment is to construct algorithms that manipulate Red-Black trees.

Constructing a Red-Black Tree

We start constructing a Red-Black tree by calling the constructor

tree = RBTree(5)

Inserting a node consists of three steps:

  1. Finding the appropriate branch to insert the node in the tree
  2. Inserting the node and creating links to the tree
  3. Rebalancing the tree

For Step 1, the procedure is not much different from a normal binary search tree. Specifically, the algorithm first finds the appropriate position for the new node by searching through the tree. In the following snippet, z is the value we would like to insert. However, the snippet has a mistake.

def search(self, tree, z):
    y = tree.nil
    x = tree.root
    while x != tree.nil:
        y = x
        if z.key < x.key:
            x = x.left
        else:
            y = x.right
    return y

T1 (10 points): Spot and fix the mistake in the search method

In [10]:
def search(tree, z):
    y = tree.nil
    x = tree.root
    while x != tree.nil:
        y = x
        if z.key < x.key:
            x = x.left
        else:
#             y = x.right
            x = x.right
    return y

Using the (hopefully correct by now!) search method, we can locate the node to insert our new node as a child. This happens by creating the node and linking it to the appropriate super/sub nodes. Note that for new nodes, the starting color is always RED.

T2 (20 points): Write code to construct and link a new node. Hint: Use the search method inside the insert method. Hint: Make the method return the new node, this is usefull for later

In [11]:
def insert(tree, z):
    newNode = RBNode(NodeColors.RED, z, tree.nil, tree.nil, None)
    parent = search(tree, newNode)
        
    if newNode.key < parent.key:
        parent.left = newNode
    else:
        parent.right = newNode
        
    newNode.parent = parent;
    return newNode
In [12]:
#Test you implementation here
tree = RBTree(5)
insert(tree, 10)

print(tree.root.right)
(NodeColors.RED)10

After inserting the node, the tree potentially needs to be rebalanced. An example of why this is so can be seen in the following plot. What we see here is that the insertion of node 4 violates invariant number 4.

rb-invariant-violation.jpg

The rebalancing operation can be quite complicated; the pseudocode below takes care of all the cases.

proceduce rb_balance(T, z)
'' T is an RBTree
'' z is the newly inserted node

while z.parent.color == RED
    if z.parent == z.parent.parent.left
        y <- z.parent.parent.right

        if y.color == RED
            z.parent.color <- BLACK
            y.color <- BLACK
            z.parent.parent.color <- RED
            z <- z.parent.parent
        else 
            if z == z.parent.right
                z <- z.parent
                left_rotate(T, z)
            z.parent.color <- BLACK
            z.parent.parent.color <- RED
            right_rotate(T, z.parent.parent)
    else
      y <- z.parent.parent.left
       if y.color == RED
            z.parent.color <- BLACK
            y.color <- BLACK
            z.parent.parent.color <- RED
            z <- z.parent.parent
        else 
            if z == z.parent.left
                z <- z.parent
                right_rotate(T, z)
            z.parent.color <- BLACK
            z.parent.parent.color <- RED
            left_rotate(T, z.parent.parent)
T.root.color <- BLACK

You might be wondering with left_rotate and right_rotate are. They are functions that represent common operations on trees. To understand how they work have a look at the plot below.

rb-tree-rotation.png

Below you can find the Python code for the left_rotate method.

In [13]:
def left_rotate(tree, x):
    y = x.right
    x.right = y.left

    if y.left != tree.nil:
        y.left.parent = x
    y.parent = x.parent
    
    if x.parent == tree.nil:
        tree.root = y
    elif x == x.parent.left:
        x.parent.left = y
    else:
        x.parent.right = y
    y.left = x
    x.parent = y

T3 (20 points): Implement the code for right_rotate

In [14]:
def right_rotate(tree, x):
    y = x.left
    x.left = y.right

    if y.right != tree.nil:
        y.right.parent = x
    y.parent = x.parent
    
    if x.parent == tree.nil:
        tree.root = y
    elif x == x.parent.right:
        x.parent.right = y
    else:
        x.parent.left = y
    y.right = x
    x.parent = y
In [15]:
tree = RBTree(5)
insert(tree, 3)
insert(tree, 2)
insert(tree, 4)
insert(tree, 7)

print(tree.root)
print(tree.root.left)
print(tree.root.left.left)
print(tree.root.left.right)
print(tree.root.right)
print()

right_rotate(tree, tree.root)

print(tree.root)
print(tree.root.left)
print(tree.root.right)
print(tree.root.right.left)
print(tree.root.right.right)
(NodeColors.BLACK)5
(NodeColors.RED)3
(NodeColors.RED)2
(NodeColors.RED)4
(NodeColors.RED)7

(NodeColors.RED)3
(NodeColors.RED)2
(NodeColors.BLACK)5
(NodeColors.RED)4
(NodeColors.RED)7

T4 (25 points): Implement the rebalancing pseudocode in Python

In [16]:
def balance(tree, z):
        while z.parent.color == NodeColors.RED:
            
            if z.parent == z.parent.parent.left:
                y = z.parent.parent.right
                if y.color == NodeColors.RED:
                    z.parent.color = NodeColors.BLACK
                    y.color = NodeColors.BLACK
                    z.parent.parent.color = NodeColors.RED
                    z = z.parent.parent
                else:
                    if z == z.parent.right:
                        z = z.parent
                        left_rotate(tree, z)
                    z.parent.color = NodeColors.BLACK
                    z.parent.parent.color = NodeColors.RED
                    right_rotate(tree, z.parent.parent)

            else:
                y = z.parent.parent.left
                if y.color == NodeColors.RED:
                    z.parent.color = NodeColors.BLACK
                    y.color = NodeColors.BLACK
                    z.parent.parent.color = NodeColors.RED
                    z = z.parent.parent
                else:
                    if z == z.parent.left:
                        z = z.parent
                        right_rotate(tree, z)
                    z.parent.color = NodeColors.BLACK
                    z.parent.parent.color = NodeColors.RED
                    left_rotate(tree, z.parent.parent)
            
        tree.root.color = NodeColors.BLACK
In [17]:
#Test you implemention here
tree = RBTree(5)
first_node = insert(tree, 10)
balance(tree, first_node)

second_node = insert(tree, 15)
balance(tree, second_node)

print(None)
print(tree.root)
print(tree.root.left)
print(tree.root.right)
print()

third_node = insert(tree, 12)
balance(tree, third_node)
print(tree.root)
print(tree.root.left)
print(tree.root.right)
print(tree.root.right.left)
None
(NodeColors.BLACK)10
(NodeColors.RED)5
(NodeColors.RED)15

(NodeColors.BLACK)10
(NodeColors.BLACK)5
(NodeColors.BLACK)15
(NodeColors.RED)12

Validating the Red-Black tree invariants

T5 (25 points): Create a method that will exhaustively search whether the five invariants described above are maintained in the tree.

In [69]:
import queue

def check(tree):
    "@return: True, only if tree satisfies all the Red-Black tree invariants"
    if not either_red_black(tree):
        print("Not all nodes either red or black")
        return False
    
    if not root_is_black(tree):
        print("Root is not black")
        return False
    
    if not every_leaf_black(tree):
        print("Not every leaf is black")
        return False
    
    if not red_node_children_black(tree):
        print("There is/are red node(s) with non-black children")
        return False
    
    if not same_number_black(tree):
        print("Not every path down had the same amount of black nodes")
        return False
    
    print("All invariants are met")
    return True
    

def either_red_black(tree):
    q = queue.Queue()
    q.put(tree.root)
    
    while (not q.empty()):
        current = q.get()
        
        if current == None:
            continue
        
        if current.color != NodeColors.BLACK and current.color != NodeColors.RED:
            return False

        q.put(current.left)
        q.put(current.right)
    
    return True

def root_is_black(tree):
    return tree.root.color == NodeColors.BLACK

def every_leaf_black(tree):
    q = queue.Queue()
    q.put(tree.root)
    
    while( not q.empty()):
        current = q.get()
        
        if current == None:
            continue
        
        if current.left == None and current.right == None:
            if current.color != NodeColors.BLACK:
                return False
        
        q.put(current.left)
        q.put(current.right)
        
    return True

def red_node_children_black(tree):
    q = queue.Queue()
    q.put(tree.root)
    
    while( not q.empty()):
        current = q.get()
        
        if current == None:
            continue
        
        if current.color == NodeColors.RED:
            if current.left.color == NodeColors.RED:
                return False
            elif current.right.color == NodeColors.RED:
                return False
        
        q.put(current.left)
        q.put(current.right)
        
    return True

def same_number_black(tree):

    try:
        __same_number_black(tree.root)
    except:
        return False
    
    return True
    
def __same_number_black(node):
    if node == tree.nil:
        return 1

    left = __same_number_black(node.left)
    right = __same_number_black(node.right)

    if left != right:
        print("ErrorError")
        raise InvariantException()

    if node.color == NodeColors.BLACK:
        left += 1

    return left
In [70]:
#Test all of the five invariants here
import copy

print(tree.root.color)
print(check(tree), "\n")

tree1 = copy.deepcopy(tree)
wrongNode = insert(tree1, 55)
wrongNode.color = None
print(check(tree1), "\n")

tree2 = copy.deepcopy(tree)
tree2.root.color = NodeColors.RED
print(check(tree2), "\n")

tree3 = copy.deepcopy(tree)
wrongNode = insert(tree3, 55)
wrongNode.left = None
wrongNode.right = None
print(check(tree3), "\n")

tree4 = copy.deepcopy(tree)
wrongNode = insert(tree4, 55)
wrongNode.parent.color = NodeColors.RED
print(check(tree4), "\n")

tree5 = copy.deepcopy(tree)
wrongNode = insert(tree5, 55)
wrongNode.color = NodeColors.BLACK
print(check(tree5), "\n")
NodeColors.BLACK
Not every path down had the same amount of black nodes
False 

Not all nodes either red or black
False 

Root is not black
False 

Not every leaf is black
False 

There is/are red node(s) with non-black children
False 

Not every path down had the same amount of black nodes
False 

Proving performance properties

T6 (10 points): Prove that the average and worse case complexity for the insertion in Red-Black trees is $O(log(n))$.

References

  1. A Red-Black tree in Java
  2. A Red-Black tree in C
  3. Wikipedia on Red-Black Trees