Imperative Balanced Binary Trees, part 1: Core balancing

2023-03-05 08:40:00 CET

I used to find balanced trees tedious to deal with in an imperative setting, particularly in C, though I love them in functional languages.

In this series I will describe an implementation strategy that achieves all I could hope for in a C implementation: safe, modular, reasonably fast, independent of an allocation strategy, easy to extend... and reasonably simple.

Balancing is achieved by calling a single helper function. Other common functions, like BST insertion and deletion, can be derived from it.

This idea is not new: in Parallel Ordered Sets Using Join, Blelloch et al use a similar approach as a building block for parallel implementations of binary trees, and afaik the core idea goes back to Functional Pearls Efficient sets—a balancing act by Adams. But these papers are concerned with immutable sets, and this blog post is about adapting the method to C.

The series is split in 4 posts:

Height-balanced trees

We start by defining a simple flavor of height-balanced trees. Here is a node:

struct node {
  struct node *left, *right;
  int height;
};

A node is represented by a struct node *. NULL denotes the empty tree. A non-empty node has two subtrees and keeps track of its height.

Next, we implement a node_height function to get the height while handling the empty case, and an initialization function that maintains the height invariant:

// Returns the height of node, handling the NULL case
static int node_height(struct node *n)
{
  return n ? n->height : 0;
}

// precondition: self != NULL
// left and right can be NULL to denote empty sub-trees.
// set_node(self, left, right) setups self so that it represents
// a tree with left and right subtrees and the correct height,
// and returns self.
static struct node *set_node(struct node *self,
                             struct node *left, struct node *right)
{
  self->left = left;
  self->right = right;
  int lh = node_height(left);
  int rh = node_height(right);
  self->height = 1 + ((lh > rh) ? lh : rh);
  return self;
}

No other function should directly mutate a struct node, so we can be confident the height is always set correctly.

Constructing balanced nodes

We will use the height field to enforce the same criterion as the one maintained by AVL trees: a tree is balanced if (1) its two sub-trees are balanced, (2) their heights differ by at most one.

// precondition: max_height >= min_height
// is_balanced(min_height, max_height) returns true if a tree 
// with sub-nodes of these heights is balanced
static bool is_balanced(int min_height, int max_height)
{
  return (max_height - min_height) <= 1;
}

By relying on "smart constructors", the tree rotation functions can be given a functional feeling:

typedef struct node *
make_fun(struct node *self, struct node *left, struct node *right);

static struct node *
rot_left(make_fun make, struct node *self,
         struct node *left, struct node *right)
{
  return make(right, make(self, left, right->left), right->right);
}

static struct node *
rot_right(make_fun make, struct node *self,
          struct node *left, struct node *right)
{
  return make(left, left->left, make(self, left->right, right));
}

Now, the tree balancing functions. To make a balanced node, we check if the two children are already balanced. If yes, not much needs to be done: we directly apply the constructor. If not, we go down the largest sub-tree, and try again.

static struct node *
node_left(struct node *self, struct node *left, struct node *right)
{
  if (is_balanced(node_height(left), node_height(right)))
    return set_node(self, left, right);
  if (right && node_height(right->right) < node_height(right->left))
    right = rot_right(node_left, right, right->left, right->right);
  return rot_left(node_left, self, left, right);
}

static struct node *
node_right(struct node *self, struct node *left, struct node *right)
{
  if (is_balanced(node_height(right), node_height(left)))
    return set_node(self, left, right);
  if (left && node_height(left->left) < node_height(left->right))
    left = rot_left(node_right, left, left->left, left->right);
  return rot_right(node_right, self, left, right);
}

struct node *make_tree(struct node *self, struct node *left, struct node *right)
{
  if (node_height(left) <= node_height(right))
    return node_left(self, left, right);
  else
    return node_right(self, left, right);
}

And that's it for the balancing! To sum up, the two big tasks are:

Note that this code does not need to do any memory management: all functions take the resulting tree as an argument (a style sometime called "destination-passing").

Result

You can download the implementation we reached so far (under MIT license).

balanced.c, balanced.h:

#ifndef BALANCED_H
#define BALANCED_H

struct node {
  struct node *left, *right;
  int height;
};

struct node *
make_tree(struct node *self, struct node *left, struct node *right);

#endif /*!BALANCED_H*/

The interface is remarkably simple, isn't it? In the next post, we will implement binary search trees on top of this interface.

Here is a simple test program (test.c, Makefile), to see how this can be used to make custom balanced datastructures:

#include <stdlib.h>
#include <stdio.h>
#include "balanced.h"

// Always insert at the right end of the tree
struct node *insert_right(struct node *current, struct node *new)
{
    if (!current)
        return new;

    return make_tree(current, current->left, insert_right(current->right, new));
}

// A simple printer, to visually confirm the trees are balanced 
void print_node(int indent, struct node *node)
{
    for (int i = 0; i < indent; ++i)
        fputc(' ', stdout);

    fputs("- ", stdout);

    if (!node)
        fputs("leaf\n", stdout);
    else
    {
        fprintf(stdout, "node, height=%d\n", node->height);
        print_node(indent + 1, node->left);
        print_node(indent + 1, node->right);
    }
}

int main(int argc, char **argv)
{
    struct node nodes[1024];
    struct node *root = NULL;

    for (int i = 0; i < 1024; ++i)
    {
        // Initialize an empty node
        struct node *node = make_tree(&nodes[i], NULL, NULL);
        // Insert it at the right end
        root = insert_right(root, node);
    }
    // Print resulting tree
    print_node(0, root);
    return 0;
}

Or here is a variant to insert in random locations:

struct node *insert_random(int seed, struct node *current, struct node *new)
{
    if (!current)
        return new;

    if ((seed & 3) == 0)
        current->left = insert_random(seed >> 1, current->left, new);
    else
        current->right = insert_random(seed >> 1, current->right, new);
    return make_tree(current, current->left, current->right);
}

Try it by replacing insert_right(root, node) with insert_random(rand(), root, node).