Pretty-printing with dominators

2020-11-14 17:55:44+01:00

A static analysis that I am working on generates complex intermediate data structures. To help debugging it, I wrote a few specialized pretty-printers. But these structures rely a lot on sharing (as in hash consing). The output of pretty-printers would easily blow up in size. To the extent that it was not helping debugging anymore.

Here is a simple example in javascript to see how things can go wrong:

leaf = "Some tag"
tree = { l: leaf, r: leaf }
tree = { l: tree, r: tree }
tree = { l: tree, r: tree }
console.info(JSON.stringify(tree, null, 2))

Which outputs:

{
  "l": {
    "l": {
      "l": "Some tag",
      "r": "Some tag"
    },
    "r": {
      "l": "Some tag",
      "r": "Some tag"
    }
  },
  "r": {
    "l": {
      "l": "Some tag",
      "r": "Some tag"
    },
    "r": {
      "l": "Some tag",
      "r": "Some tag"
    }
  }
}

A tree described in n steps turns into a JSON file of size O(2^n). This is a pathological example, but even practical cases can grow enough to make them very hard to read.

Post-order traversal

To help make sense of it, I had to make sharing explicit in the pretty-printed term. A common notation for that is to use "let binders". The example above could be printed as:

let n0 = "Some tag" in
let n1 = { "l": n0, "r": n0 } in
let n2 = { "l": n1, "r": n1 } in
{ "l": n2, "r": n2 }

But how do we decide where to introduce them?

c9x suggested a simple solution: introduce the bindings in the post-order traversal of the graph.

we visit all nodes, labeling them by their index in the traversal
when we revisit a node, we mark it as shared, to remember to introduce it by a let-binding, and skip its children
we then print all shared nodes using let-binders, ordered by their post-order label.

The post-order ensures that children nodes are bound before their parents.

Update: @c-cube remarked that this analysis is suitable to export in Lean export format

Binding at dominators

A simple post-order traversal would be good enough for a printer, but it is not really pretty. Shared nodes are all bound at the top-level: we recovered a compact notation but it doesn't preserve the locality of nodes. This example exhibits the problem:

lleaf = "Left tag"
ltree = { l: lleaf, r: lleaf }
ltree = { l: ltree, r: ltree }
rleaf = "Right tag"
rtree = { l: rleaf, r: rleaf }
rtree = { l: rtree, r: rtree }
tree = { l: ltree, r: rtree }

It gets pretty-printed by the post-order algorithm as:

let n0 = "Left tag" in
let n1 = { "l": n0, "r": n0 } in
let n2 = "Right tag" in
let n3 = { "l": n2, "r": n2 } in
{ "l": { "l": n1, "r": n1 },
  "r": { "l": n3, "r": n3 } }

Sharing is represented, but it is not that easy to make sense of the structure. It would be much easier to contextualize if related components were close to each other.

Like by printing them in the smallest scope possible... Which, as observed by my friend @trefis, would be at the node that dominates them!

If we introduce the let-bindings at the dominating nodes we get:

{
    l: let n0 = "Left tag" in
       let n1 = { l: n0, r: n0 } in
       { l: n1, r: n1},
    r: let n2 = "Right tag" in
       let n3 = { l: n2, r: n2 } in
       { l: n3, r: n3}
}

Which proves much better in practice.

Computing dominators

There are two popular algorithms to compute dominators:

The Lengauer-Tarjan algorithm, described in "A fast algorithm for finding dominators in a flowgraph". It has the best known runtime for this problem (|E|*\alpha(|E|,|V|), like union-find).
The more recent algorithm proposed by Cooper, Harvey & Kennedy, "A Simple, Fast Dominance Algorithm". Its worst case is O(n^2), but it is much easier to implement.

Furthermore, the simple one is linear for graphs that have a simple "loop connectedness". Because the datastructures I care about are cycle-free, it is indeed a perfect fit.

If they were cyclic it would be worth considering Lengauer-Tarjan to avoid degenerate cases. And to turn some "let" bindings into "let-rec" ones 🙂.

Conclusion

Dominators are useful for pretty-printing directed graphs in a textual form:

explicit sharing reduces the size of the output and make it easier to digest,
introducing the binders in the dominators reveals the shape of sharing and cycles.

I was not expecting a dominance problem to appear in the middle of a pretty-printing algorithm. This was a pleasant discovery that, in retrospect, seems kind of obvious.

Thanks to @Armael for some corrections