Cuite Design (1/?): QObject in OCaml

2020-05-10 18:39:59+02:00

Two years ago, I worked on "Cuite", an OCaml binding to Qt5. The project stalled when I reached the point where all core concepts were mapped to OCaml. The remaining work was very repetitive: going through the vast hierarchy of Qt classes and binding each method, accommodating for the occasional ad-hoc behavior.

There are also some shortcomings in my approach that need to be revisited:

The mapping between C++ and OCaml types is quite ad-hoc. There is no principled way to handle all the variations (some types behave like values, some like references, some exist as part of a graph, some make sense on their own, etc.).
The runtime support library relies heavily on internals of the OCaml runtime and would benefit from a cleanup.
The lack of ad-hoc polymorphism means that C++ method invocation has to be very explicit (e.g., foo->setBar(baz) translates to Foo.setBar foo baz). Also, the huge number of methods sometimes significantly slows down compilation.

This post is the first in a series where I explain the thoughts that went into the design of the library and how these issues are addressed.

Exposing QObjects

QObject is the root of the main class hierarchy in Qt. It is used everywhere: all widgets are QObject instances.

The binding needs to expose QObject classes, instances, and functions to OCaml programs. In this post, we will take a look at memory management: how QObjects are allocated and released when manipulated from OCaml.

There are a few properties that I wanted the binding to preserve. This is subjective; another binding might look for different properties. Here is what I was looking for:

Runtime safety. Incorrect use of the API should translate to an exception, not to a segmentation fault or memory corruption.
Automatic memory management with opt-out. Most of the time, the programmer should not worry about memory management. Occasionally, they might want to ensure that memory is released on time. For instance, when allocating large objects such as a picture, it is nice to release memory as early as possible.
No arbitrary restrictions or ad-hoc rules for objects (unless there is no alternative). Programmers should not worry about cyclic references or have to manage certain objects differently (except maybe for performance reasons).
QObjects should interact well with other OCaml features. Physical equality, ordering, and hashing should make sense.

I ended up with a scheme that provides all these properties to the binding. The rest of the post focuses on memory management for QObjects.

QObject values

Each QObject instance visible from the OCaml program is mapped to a unique value. This graph shows all the infrastructure involved.

Exposing a QObject to the OCaml world

An instance QObject *obj is made accessible from OCaml code via the mlproxy value. In other words, we want the functions:

value Val_QObject(QObject *obj);
QObject *QObject_val(value v);

QObjectal: from value to QObject

The OCaml block mlproxy contains a pointer to an object cproxy in the C++ heap. In turn cproxy has a pointer to obj, the QObject.

To get to the QObject from the OCaml value we just need two follow two pointers.

Handling QObject destruction

We need to keep track of when the QObject is deleted: the OCaml value might still be reachable and we don't want to accidentally deferences the QObject past that point.

This is not too difficult, we can either:

Use a QPointer<QObject> instead of a QObject*: Qt will clear the QPointer on object deletion.
Listen on the destroyed signal of the QObject.

From QObject to CProxy

The Val_QObject function will be invoked many times, we don't want to create a new proxy each time. The ProxyTable remember the CProxy associated to a QObject. It is a hash-table indexed by object addresses. It is populated by the helper function:

static CProxy *QObject_proxy(QObject *obj);

QObject_proxy starts by looking up the hash-table. If a valid proxy is found, it is returned. Otherwise, the object has not yet been exported to OCaml world. We allocate, initialize, and add a new CProxy to the table. The weakid field is initialized to -1.

From CProxy to value: the weakid field

We have a CProxy, but not yet an OCaml value. The weakid field is an index in the WeakTable, a global OCaml table that weakly references MLProxy's:

If the field is not -1, a cell is already allocated. We can look directly in the weak table.
If the field is -1, we allocate and initialize a new MLProxy value that points to the CProxy and index it in the weak table.

This is done from a primitive exported by OCaml code that also registers a finalizer to handle the cleanup of unreachable objects:

val finalize_and_index : ml_proxy -> int

Why go through the hoops of this weak table? Because C++ code needs to access the OCaml values but normal roots are strong references. That would prevent MLProxy values from being collectible by the GC.

QObjectval/ValQObject: ✔️

We now have both functions:

value Val_QObject(QObject *obj);
QObject *QObject_val(value v);

They:

can convert from value to QObject and from QObject to value
safely handle explicit QObject deletion
enable automatic deletion of unreachable objects