A typeof operator in OCaml

2021-06-25 21:26:20+09:00

Let’s say one is implementing a source to source rewriter for OCaml (a preprocessor, like a PPX library) and needs to manipulate the type of an expression. They don’t want to execute the expression, just want to refer to its type, something like a type of <expr> operator.

OCaml lets you bind the type of a sub-expression to a variable, e.g. (<expr> : 'my_var), and you can then refer to 'my_var in the rest of the expression. But can we do the same in a module and bind the type to a type constructor?

In this blog post, I will give a syntactic construction to realize the "type of" operator:

type t = [%typeof expr]

like we can already do for module:

module type T = module type of M

Menhir, the parser generator, needs something similar to infer the type of semantic actions and non-terminals. It is useful to improve usability and necessary for the inspection features (see ‘Inspection API’). To do so, it runs ocaml a first time in isolation to infer the interface of a specially crafted file, then it parses that interface. This complicates Menhir and the build process significantly (see ‘Interaction with build systems’) and makes it less flexible. Could we do the same in a single pass, directly in OCaml code?

Invocation of C(++)thulhu

It turns out that the following encoding does just that:

type my_type = [%type_of <<some_expr>>]

~=

include (
  (functor (M : sig module type T module X : T end) -> M.X)
    (struct
      let some_expr () = <<some_expr>>
      module type T0 = sig type my_type end
      module X = (val (
          (fun (type a) (_ : unit -> a) :
             (module T0 with type my_type = a) ->
             (module struct type my_type = a end))
          some_expr
        ))
      module type T = module type of X
    end)
)

<<some_expr>> range over expressions and my_type over type names: replace them with the actual expression and the name you want.

Let’s go through it layer by layer, in a top-level. For the sake of this example, we will try to infer the type of 5, e.g. implementing type my_type = [%type_of 5].

First we wrap the expression in a function to delay evaluation. This prevents any side effect from happening:

# let some_expr () = 5;;
val some_expr : unit -> int

Type inference is done and the definition almost has the type we want to name.

We will use first-class modules to construct a type declaration. The typechecker requires to name the signatures that are used in first-class modules, so we define T0:

# module type T0 = sig type my_type end;;

But the benefits of first-class modules is that they are actually expressions, which will allow type inference to fill the type information.

The next line is trickier, but look at the answer of ocaml:

# module X = (val (
             (fun (type a) (_ : unit -> a) :
                (module T0 with type my_type = a) ->
                (module struct type my_type = a end))
             some_expr
           ));;
module X : sig type my_type = int end

It seems we are almost done: we have a type definition X.my_type = int in the environment. It was produced by the type checker (we never referred to int ourselves). We could get away with a simple include X tobring type my_int = int in the environment. But that would also leave some garbage names behind (some_expr, T0, X)... It is bad to pollute the environment :).

That being said, let's review the last definition:

(fun (type a) (_ : unit -> a) :
   (module T0 with type my_type = a) ->
   (module struct type my_type = a end))

This function has type (unit -> 'a) -> (module T0 with type my_type = 'a). It fills two purposes: extracting the type at the righthand side of the arrow to get rid of the unit we introduced earlier, and producing a first class module with the signature we are looking for.

The with constraint plays an important role. It turns the abstract type my_type of T0 to a manifest (type my_type = a). That's key to injecting a type variable in a type constructor.

The functions is then applied to some_expr: unification replaces 'a with int and the whole evaluates to a value of type (module T0 with type my_type = int).

At last, (val (...)) "opens the package": it turns back the first-class module, a term, into a module.

Now how do we clean the environment? In an ideal world, we would simply wrap the definition and project X:

include struct
         let some_expr () = <<some_expr>>
         module type T0 = sig type my_type end
         module X = (val (
             (fun (type a) (_ : unit -> a) :
                (module T0 with type my_type = a) ->
                (module struct type my_type = a end))
             some_expr
           ))
       end.X

However projecting from a syntactic structure is not allowed in OCaml. It has to be bound to a name to allow projection... Like the argument of a functor! An anonymous functor can do the projection without leaving trace.

The type of this functor is a bit tricky to define. It takes an argument that contains the X we want to project. The implementation could look like: functor (M : sig module X end) -> M.X.

But we are not allowed to define the module X without giving it a type. But the functor doesn't do anything with the contents of X, it just returns it. An abstract module type is therefore sufficient: functor (M : sig module type T module X : T end) -> M.X.

The type of this functor is thus: functor (M : sig module type T module X : T end) -> M.T.

Which T should we pass to the functor? The T0 above could do, but my_type is abstract in this signature. The = int would be lost, defeating our purpose. One more layer of module magic saves us: module type T = module type of X. T is exactly the type of X!

Putting everything together, we can construct the tricky structure and immediately project from it. Let's try in a fresh interpreter:

# include (
     (functor (M : sig module type T module X : T end) -> M.X)
       (struct
         let some_expr () = 5
         module type T0 = sig type my_type end
         module X = (val (
             (fun (type a) (_ : unit -> a) :
                (module T0 with type my_type = a) ->
                (module struct type my_type = a end))
             some_expr
           ))
         module type T = module type of X
       end)
   );;
type my_type = int
# 

It produces the type definition we were looking for, nothing more, nothing less :-). Note that the encoding not only does not pollute the global environment, but it also preserves the scope of <<some_expr>>. The rewriting is eco-friendly and hygienic: T0, X, etc, are not yet visible, there is no risk of name clash.

Conclusion

We just provided a syntactic construction that implements a type_of operator. It is limited to inferring monomorphic types. A polymorphic definition such as [] will lead to an error like:

Error: The type of this packed module contains variables:
       (module type_of with type my_type = 'a list)

which can be slightly improved to:

Error: The type of this packed module contains variables:
       (module type'of with type my_type = 'a list)

Not perfect but quite understandable. With the correct instrumentation, a PPX could report a precise location to the user. Now if only someone could write this PPX :P.

Finally, like Menhir inference trick, this construction easily extends to multiple definitions, e.g.

type a = [%type_of foo]
and b = [%type_of bar]

~=

include (
  (functor (M : sig module type T module X : T end) -> M.X)
    (struct
      let some_expr () = (foo), (bar)
      module type T0 = sig type a type b end
      module X = (val (
          (fun (type a b) (_ : unit -> a * b) :
             (module T0 with type a = a and type b = b) ->
             (module struct type nonrec a = a and b = b end))
          some_expr
        ))
      module type T = module type of X
    end)
)