Pretty-printing with dominators

2020-11-14 17:55:44+01:00

A static analysis that I am working on generates complex intermediate data structures. To help debugging it, I wrote a few specialized pretty-printers. But these structures rely a lot on sharing (as in hash consing). The output of pretty-printers would easily blow up in size. To the extent that it was not helping debugging anymore.

Here is a simple example in javascript to see how things can go wrong:

leaf = "Some tag"
tree = { l: leaf, r: leaf }
tree = { l: tree, r: tree }
tree = { l: tree, r: tree }
console.info(JSON.stringify(tree, null, 2))

Which outputs:

{
  "l": {
    "l": {
      "l": "Some tag",
      "r": "Some tag"
    },
    "r": {
      "l": "Some tag",
      "r": "Some tag"
    }
  },
  "r": {
    "l": {
      "l": "Some tag",
      "r": "Some tag"
    },
    "r": {
      "l": "Some tag",
      "r": "Some tag"
    }
  }
}

A tree described in n steps turns into a JSON file of size O(2^n). This is a pathological example, but even practical cases can grow enough to make them very hard to read.

Post-order traversal

To help make sense of it, I had to make sharing explicit in the pretty-printed term. A common notation for that is to use "let binders". The example above could be printed as:

let n0 = "Some tag" in
let n1 = { "l": n0, "r": n0 } in
let n2 = { "l": n1, "r": n1 } in
{ "l": n2, "r": n2 }

But how do we decide where to introduce them?

c9x suggested a simple solution: introduce the bindings in the post-order traversal of the graph.

The post-order ensures that children nodes are bound before their parents.

Update: @c-cube remarked that this analysis is suitable to export in Lean export format

Binding at dominators

A simple post-order traversal would be good enough for a printer, but it is not really pretty. Shared nodes are all bound at the top-level: we recovered a compact notation but it doesn't preserve the locality of nodes. This example exhibits the problem:

lleaf = "Left tag"
ltree = { l: lleaf, r: lleaf }
ltree = { l: ltree, r: ltree }
rleaf = "Right tag"
rtree = { l: rleaf, r: rleaf }
rtree = { l: rtree, r: rtree }
tree = { l: ltree, r: rtree }

It gets pretty-printed by the post-order algorithm as:

let n0 = "Left tag" in
let n1 = { "l": n0, "r": n0 } in
let n2 = "Right tag" in
let n3 = { "l": n2, "r": n2 } in
{ "l": { "l": n1, "r": n1 },
  "r": { "l": n3, "r": n3 } }

Sharing is represented, but it is not that easy to make sense of the structure. It would be much easier to contextualize if related components were close to each other.

Like by printing them in the smallest scope possible... Which, as observed by my friend @trefis, would be at the node that dominates them!

If we introduce the let-bindings at the dominating nodes we get:

{
    l: let n0 = "Left tag" in
       let n1 = { l: n0, r: n0 } in
       { l: n1, r: n1},
    r: let n2 = "Right tag" in
       let n3 = { l: n2, r: n2 } in
       { l: n3, r: n3}
}

Which proves much better in practice.

Computing dominators

There are two popular algorithms to compute dominators:

Furthermore, the simple one is linear for graphs that have a simple "loop connectedness". Because the datastructures I care about are cycle-free, it is indeed a perfect fit.

If they were cyclic it would be worth considering Lengauer-Tarjan to avoid degenerate cases. And to turn some "let" bindings into "let-rec" ones 🙂.

Conclusion

Dominators are useful for pretty-printing directed graphs in a textual form:

I was not expecting a dominance problem to appear in the middle of a pretty-printing algorithm. This was a pleasant discovery that, in retrospect, seems kind of obvious.

Thanks to @Armael for some corrections