Imperative Balanced Binary Trees, Part 1: Core Balancing
2023-03-05 08:40:00 CET
I used to find balanced trees tedious to deal with in an imperative setting, particularly in C, though I love them in functional languages.
In this series, I will describe an implementation strategy that achieves all I could hope for in a C implementation: safe, modular, reasonably fast, independent of an allocation strategy, easy to extend... and reasonably simple.
Balancing is achieved by calling a single helper function. Other common functions, like BST insertion and deletion, can be derived from it.
This idea is not new: in Parallel Ordered Sets Using Join, Blelloch et al use a similar approach as a building block for parallel implementations of binary trees, and afaik, the core idea goes back to Functional Pearls: Efficient Sets—a Balancing Act by Adams. But these papers are concerned with immutable sets, and this blog post is about adapting the method to C.
The series is split into 4 posts:
- This post implements balancing and exposes the modular interface.
- In part 2, we will implement binary search trees (BST).
- In part 3, we show how to safely maintain additional structural invariants.
- In part 4, we extend the approach to other balancing criteria.
Height-balanced Trees
We start by defining a simple flavor of height-balanced trees. Here is a node:
struct node {
struct node *left, *right;
int height;
};
A node is represented by a struct node *
. NULL
denotes the empty tree. A non-empty node has two subtrees and keeps track of its height.
Next, we implement a node_height
function to get the height while handling the empty case, and an initialization function that maintains the height invariant:
// Returns the height of node, handling the NULL case
static int node_height(struct node *n)
{
return n ? n->height : 0;
}
// precondition: self != NULL
// left and right can be NULL to denote empty sub-trees.
// set_node(self, left, right) setups self so that it represents
// a tree with left and right subtrees and the correct height,
// and returns self.
static struct node *set_node(struct node *self,
struct node *left, struct node *right)
{
self->left = left;
self->right = right;
int lh = node_height(left);
int rh = node_height(right);
self->height = 1 + ((lh > rh) ? lh : rh);
return self;
}
No other function should directly mutate a struct node
, so we can be confident the height is always set correctly.
Constructing balanced nodes
We will use the height field to enforce the same criterion as the one maintained by AVL trees: a tree is balanced if (1) its two sub-trees are balanced, (2) their heights differ by at most one.
// precondition: max_height >= min_height
// is_balanced(min_height, max_height) returns true if a tree
// with sub-nodes of these heights is balanced
static bool is_balanced(int min_height, int max_height)
{
return (max_height - min_height) <= 1;
}
By relying on "smart constructors", the tree rotation functions can be given a functional feeling:
typedef struct node *
make_fun(struct node *self, struct node *left, struct node *right);
static struct node *
rot_left(make_fun make, struct node *self,
struct node *left, struct node *right)
{
return make(right, make(self, left, right->left), right->right);
}
static struct node *
rot_right(make_fun make, struct node *self,
struct node *left, struct node *right)
{
return make(left, left->left, make(self, left->right, right));
}
Now, the tree balancing functions. To make a balanced node, we check if the two children are already balanced. If yes, not much needs to be done: we directly apply the constructor. If not, we go down the largest sub-tree, and try again.
static struct node *
node_left(struct node *self, struct node *left, struct node *right)
{
if (is_balanced(node_height(left), node_height(right)))
return set_node(self, left, right);
if (right && node_height(right->right) < node_height(right->left))
right = rot_right(node_left, right, right->left, right->right);
return rot_left(node_left, self, left, right);
}
static struct node *
node_right(struct node *self, struct node *left, struct node *right)
{
if (is_balanced(node_height(right), node_height(left)))
return set_node(self, left, right);
if (left && node_height(left->left) < node_height(left->right))
left = rot_left(node_right, left, left->left, left->right);
return rot_right(node_right, self, left, right);
}
struct node *make_tree(struct node *self, struct node *left, struct node *right)
{
if (node_height(left) <= node_height(right))
return node_left(self, left, right);
else
return node_right(self, left, right);
}
And that's it for the balancing! To sum up, the two big tasks are:
- first, implementing a balancing criterion using a function
set_node
that maintains some metric used by another function,is_balanced
, to check if the criterion is satisfied - then, the
make_tree
function constructs trees balanced according to the criterion
Note that this code does not need to do any memory management: all functions take the resulting tree as an argument (a style sometime called "destination-passing").
Result
You can download the implementation we reached so far (under MIT license).
#ifndef BALANCED_H
#define BALANCED_H
struct node {
struct node *left, *right;
int height;
};
struct node *
make_tree(struct node *self, struct node *left, struct node *right);
#endif /*!BALANCED_H*/
The interface is remarkably simple, isn't it? In the next post, we will implement binary search trees on top of this interface.
Here is a simple test program (test.c, Makefile), to see how this can be used to make custom balanced datastructures:
#include <stdlib.h>
#include <stdio.h>
#include "balanced.h"
// Always insert at the right end of the tree
struct node *insert_right(struct node *current, struct node *new)
{
if (!current)
return new;
return make_tree(current, current->left, insert_right(current->right, new));
}
// A simple printer, to visually confirm the trees are balanced
void print_node(int indent, struct node *node)
{
for (int i = 0; i < indent; ++i)
fputc(' ', stdout);
fputs("- ", stdout);
if (!node)
fputs("leaf\n", stdout);
else
{
fprintf(stdout, "node, height=%d\n", node->height);
print_node(indent + 1, node->left);
print_node(indent + 1, node->right);
}
}
int main(int argc, char **argv)
{
struct node nodes[1024];
struct node *root = NULL;
for (int i = 0; i < 1024; ++i)
{
// Initialize an empty node
struct node *node = make_tree(&nodes[i], NULL, NULL);
// Insert it at the right end
root = insert_right(root, node);
}
// Print resulting tree
print_node(0, root);
return 0;
}
Or here is a variant to insert in random locations:
struct node *insert_random(int seed, struct node *current, struct node *new)
{
if (!current)
return new;
if ((seed & 3) == 0)
current->left = insert_random(seed >> 1, current->left, new);
else
current->right = insert_random(seed >> 1, current->right, new);
return make_tree(current, current->left, current->right);
}
Try it by replacing insert_right(root, node)
with insert_random(rand(), root, node)
.