Frédéric Bour2023-11-19T20:32:09ZFrédéric Bourhttps://def.lakaban.net,2020-03-14:default-atom-feed/Imperative BBT part 2: Binary Search Trees<h1 id="imperative-bbt-part-2-binary-search-trees">Imperative BBT part 2: Binary Search Trees</h1>
<p>2023-03-07
09:00:00 CET</p>
<p>First we define the type of BST labelled with integers:</p>
<pre><code class="language-c">struct bst_node {
struct node node;
int value;
};
struct bst_node *
bst_left(struct bst_node *node)
{
return (struct bst_node *)node->node.left;
}
struct bst_node *
bst_right(struct bst_node *node)
{
return (struct bst_node *)node->node.right;
}
struct bst_node *
bst_make(struct bst_node *self, struct bst_node *left, struct bst_node *right)
{
return (struct bst_node*)make_tree(&self->node, &left->node, &right->node);
}
</code></pre>
<p>There are mostly wrappers over the <code>struct node</code> type.</p>
<h2 id="insertion">Insertion</h2>
<p>Insertion of a node in a BST is simple: if the tree is empty, we simply return the inserted node. Otherwise, we check if we should insert in the left or the right sub-trees by comparing the label.</p>
<p>Like before, the node to insert has to be provided by the caller, so that this code does not deal with allocation.</p>
<pre><code class="language-c">struct bst_node *
bst_insert(struct bst_node *new_node, struct bst_node *tree)
{
if (!tree)
return bst_make(new_node, NULL, NULL);
struct bst_node *left = bst_left(tree);
struct bst_node *right = bst_right(tree);
if (new_node->value < tree->value)
left = bst_insert(new_node, left);
else
right = bst_insert(new_node, right);
return bst_make(tree, left, right);
}
</code></pre>
<p>This code is literally the same as insertion in a plain BST. However, since <code>bst_make</code> uses <code>make_tree</code> under-the-hood, we get a balanced tree without any effort!</p>
<h2 id="deletion">Deletion</h2>
<p>The story is almost the same for deletion: we implement the usual BST deletion algorithm. The deleted node, if any, is returned to the caller for memory management purposes.</p>
<pre><code class="language-c">struct bst_node *
bst_delete(struct bst_node *tree, int value, struct bst_node **deleted)
{
if (!tree)
{
*deleted = NULL;
return NULL;
}
struct bst_node *left = bst_left(tree);
struct bst_node *right = bst_right(tree);
if (value == tree->value)
{
*deleted = tree;
return bst_join(left, right);
}
if (value < tree->value)
left = bst_delete(left, value, deleted);
else
right = bst_delete(right, value, deleted);
return bst_make(tree, left, right);
}
</code></pre>
<p>However, we need a new operation: <code>bst_join</code>, that constructs a tree by concatenating, in order, the nodes of two trees.
This new helper is necessary because after deleting a node, we still need to rebuild a tree from its two sub-trees. We will look at it now.</p>
<h3 id="the-join-operation">The join operation</h3>
<p>The join operation isn't specific two BSTs (it doesn't depend on labels), it can
be defined on any balanced tree. We can implement it by building upon
<code>make_tree</code>, that relieves us from thinking about balancing. </p>
<p>However, it can be useful to look at the height to decide which sub-tree to
traverse. By recursing on the deeper one, we minimize the balancing work
needed... Though this is a minor optimization.</p>
<pre><code class="language-c">struct node *
join_tree(struct node *left, struct node *right)
{
if (!right)
return left;
if (!left)
return right;
if (left->height < right->height)
return make_tree(right, join_tree(left, right->left), right->right);
else
return make_tree(left, left->left, join_tree(left->right, right));
}
</code></pre>
<p>And finally, we lift the operation to work on BST:</p>
<pre><code class="language-c">struct bst_node *
bst_join(struct bst_node *left, struct bst_node *right)
{
return (struct bst_node*)join_tree(&left->node, &right->node);
}
</code></pre>
<h2 id="result">Result</h2>
<p>In this post we saw that the primitive <code>make_tree</code> from the first article makes the implementation of <em>balanced</em> BSTs trivial.
The insertion and deletion algorithms are very close to the naive ones, yet we get balancing almost for free.</p>https://def.lakaban.net/2023-03-07-binary-search-trees2023-03-07T00:00:00Z2023-03-07T00:00:00ZImperative Balanced Binary Trees, part 1: Core balancing<h1 id="imperative-balanced-binary-trees-part-1-core-balancing">Imperative Balanced Binary Trees, part 1: Core balancing</h1>
<p>2023-03-05
08:40:00 CET</p>
<p>I used to find balanced trees tedious to deal with in an imperative setting,
particularly in C, though I love them in functional languages.</p>
<p>In this series I will describe an implementation strategy that achieves all I
could hope for in a C implementation: safe, modular, reasonably fast,
independent of an allocation strategy, easy to extend... and reasonably simple.</p>
<p>Balancing is achieved by calling a <em>single helper function</em>. Other common
functions, like BST insertion and deletion, can be derived from it.</p>
<p>This idea is not new: in <a href="http://arxiv.org/abs/1602.02120">Parallel Ordered Sets Using Join</a>, Blelloch et al
use a similar approach as a building block for parallel implementations of
binary trees, and afaik the core idea goes back to
<a href="https://www.cambridge.org/core/journals/journal-of-functional-programming/article/functional-pearls-efficient-setsa-balancing-act/0CAA1C189B4F7C15CE9B8C02D0D4B54E">Functional Pearls Efficient sets—a balancing act</a> by Adams.
But these papers are concerned with immutable sets, and this blog post is about
adapting the method to C.</p>
<p>The series is split in 4 posts:</p>
<ul>
<li>this post implements balancing and exposes the modular interface</li>
<li>in <a href="2023-03-07-binary-search-trees.md">part 2</a>, we will implement binary search trees (BST)</li>
<li>in part 3, we show how to safely maintain additional structural invariants</li>
<li>in part 4, we extend the approach to other balancing criteria </li>
</ul>
<h2 id="heightbalanced-trees">Height-balanced trees</h2>
<p>We start by defining a simple flavor of height-balanced trees. Here is a node:</p>
<pre><code class="language-c">struct node {
struct node *left, *right;
int height;
};
</code></pre>
<p>A node is represented by a <code>struct node *</code>. <code>NULL</code> denotes the empty tree. A non-empty node has two subtrees and keeps track of its height.</p>
<p>Next, we implement a <code>node_height</code> function to get the height while handling the empty case, and an initialization function that maintains the height invariant:</p>
<pre><code class="language-c">// Returns the height of node, handling the NULL case
static int node_height(struct node *n)
{
return n ? n->height : 0;
}
// precondition: self != NULL
// left and right can be NULL to denote empty sub-trees.
// set_node(self, left, right) setups self so that it represents
// a tree with left and right subtrees and the correct height,
// and returns self.
static struct node *set_node(struct node *self,
struct node *left, struct node *right)
{
self->left = left;
self->right = right;
int lh = node_height(left);
int rh = node_height(right);
self->height = 1 + ((lh > rh) ? lh : rh);
return self;
}
</code></pre>
<p>No other function should directly mutate a <code>struct node</code>, so we can be confident the height is always set correctly.</p>
<h2 id="constructing-balanced-nodes">Constructing balanced nodes</h2>
<p>We will use the height field to enforce the same criterion as the one maintained by AVL trees: a tree is balanced if (1) its two sub-trees are balanced, (2) their heights differ by at most one. </p>
<pre><code class="language-c">// precondition: max_height >= min_height
// is_balanced(min_height, max_height) returns true if a tree
// with sub-nodes of these heights is balanced
static bool is_balanced(int min_height, int max_height)
{
return (max_height - min_height) <= 1;
}
</code></pre>
<p>By relying on "smart constructors", the tree rotation functions can be given a functional feeling:</p>
<pre><code class="language-c">typedef struct node *
make_fun(struct node *self, struct node *left, struct node *right);
static struct node *
rot_left(make_fun make, struct node *self,
struct node *left, struct node *right)
{
return make(right, make(self, left, right->left), right->right);
}
static struct node *
rot_right(make_fun make, struct node *self,
struct node *left, struct node *right)
{
return make(left, left->left, make(self, left->right, right));
}
</code></pre>
<p>Now, the tree balancing functions. To make a balanced node, we check if the two children are already balanced. If yes, not much needs to be done: we directly apply the constructor. If not, we go down the largest sub-tree, and try again.</p>
<pre><code class="language-c">static struct node *
node_left(struct node *self, struct node *left, struct node *right)
{
if (is_balanced(node_height(left), node_height(right)))
return set_node(self, left, right);
if (right && node_height(right->right) < node_height(right->left))
right = rot_right(node_left, right, right->left, right->right);
return rot_left(node_left, self, left, right);
}
static struct node *
node_right(struct node *self, struct node *left, struct node *right)
{
if (is_balanced(node_height(right), node_height(left)))
return set_node(self, left, right);
if (left && node_height(left->left) < node_height(left->right))
left = rot_left(node_right, left, left->left, left->right);
return rot_right(node_right, self, left, right);
}
struct node *make_tree(struct node *self, struct node *left, struct node *right)
{
if (node_height(left) <= node_height(right))
return node_left(self, left, right);
else
return node_right(self, left, right);
}
</code></pre>
<p>And that's it for the balancing! To sum up, the two big tasks are:</p>
<ul>
<li>first, implementing a balancing criterion using a function <code>set_node</code> that maintains some metric used by another function, <code>is_balanced</code>, to check if the criterion is satisfied</li>
<li>then, the <code>make_tree</code> function constructs trees balanced according to the criterion</li>
</ul>
<p>Note that this code does not need to do any memory management: all functions take the resulting tree as an argument (a style sometime called "destination-passing").</p>
<h2 id="result">Result</h2>
<p>You can download the implementation we reached so far (under MIT license).</p>
<p><a href="balanced.c">balanced.c</a>, <a href="balanced.h">balanced.h</a>:</p>
<pre><code class="language-c">#ifndef BALANCED_H
#define BALANCED_H
struct node {
struct node *left, *right;
int height;
};
struct node *
make_tree(struct node *self, struct node *left, struct node *right);
#endif /*!BALANCED_H*/
</code></pre>
<p>The interface is remarkably simple, isn't it?
In the next post, we will implement binary search trees on top of this interface.</p>
<p>Here is a simple test program (<a href="test.c">test.c</a>, <a href="Makefile">Makefile</a>), to see how this can be used to make custom balanced datastructures:</p>
<pre><code class="language-c">#include <stdlib.h>
#include <stdio.h>
#include "balanced.h"
// Always insert at the right end of the tree
struct node *insert_right(struct node *current, struct node *new)
{
if (!current)
return new;
return make_tree(current, current->left, insert_right(current->right, new));
}
// A simple printer, to visually confirm the trees are balanced
void print_node(int indent, struct node *node)
{
for (int i = 0; i < indent; ++i)
fputc(' ', stdout);
fputs("- ", stdout);
if (!node)
fputs("leaf\n", stdout);
else
{
fprintf(stdout, "node, height=%d\n", node->height);
print_node(indent + 1, node->left);
print_node(indent + 1, node->right);
}
}
int main(int argc, char **argv)
{
struct node nodes[1024];
struct node *root = NULL;
for (int i = 0; i < 1024; ++i)
{
// Initialize an empty node
struct node *node = make_tree(&nodes[i], NULL, NULL);
// Insert it at the right end
root = insert_right(root, node);
}
// Print resulting tree
print_node(0, root);
return 0;
}
</code></pre>
<p>Or here is a variant to insert in random locations:</p>
<pre><code class="language-c">struct node *insert_random(int seed, struct node *current, struct node *new)
{
if (!current)
return new;
if ((seed & 3) == 0)
current->left = insert_random(seed >> 1, current->left, new);
else
current->right = insert_random(seed >> 1, current->right, new);
return make_tree(current, current->left, current->right);
}
</code></pre>
<p>Try it by replacing <code>insert_right(root, node)</code> with <code>insert_random(rand(), root, node)</code>.</p>https://def.lakaban.net/2023-03-06-simple-balanced-binary-trees-in-c2023-03-05T00:00:00Z2023-03-05T00:00:00ZHigh quality scrolling with Emacs<h1 id="high-quality-scrolling-with-emacs">High quality scrolling with Emacs</h1>
<p>2023-03-05
11:30:30 CET</p>
<p><a href="https://doomemacs.org">Doom Emacs</a> convinced me to switch to emacs after being a long time vim user.
Naturally, I spent a lot of time tweaking my emacs setup 😃 and I settled with Emacs
29.0 using <a href="https://www.emacswiki.org/emacs/GccEmacs">native-compilation</a> (see also <a href="https://akrl.sdf.org/gccemacs.html">gcc-emacs</a>).</p>
<p>On macOS, I use a custom built emacs-plus <a href="https://github.com/d12frosted/homebrew-emacs-plus">formula</a>. The specific command line is:</p>
<pre><code class="language-shell">$ brew install emacs-plus@29 --with-xwidgets --with-native-comp
</code></pre>
<p>Native comp speeds things up and Xwidgets adds a built-in web browser based on webkit.
As for the version, 29 feels much snappier than 28. This is more
pronounced on macOS, 28 was OK on Linux. No idea why.</p>
<h2 id="highquality-scrolling">High-Quality Scrolling</h2>
<p>One of the addition of Emacs 29 is the <a href="https://www.emacswiki.org/emacs/SmoothScrolling#h5o-6"><code>pixel-scroll-precision-mode</code></a>. Just
enable it and, if you are using a windowed version of emacs, you should have a
vertical scroll that is pixel-based rather than line-based.</p>
<p>This feels much better when using trackpad scrolling:</p>
<p><div align=center><video controls width=435>
<source src="scroll.mp4" type="video/mp4">
Download the <a href="scroll.mp4">MP4</a> video.
</video></div></p>
<h3 id="fixing-wheelbased-horizontal-scrolling-and-text-scaling">Fixing wheel-based horizontal scrolling and text scaling</h3>
<p>By default, Emacs regroups multiple scroll events into a single one large enough
to scroll one line. This produces much fewer events with a coarse precision.
This goes against the smooth experience sought by <code>pixel-scroll-precision-mode</code>,
so it disables this feature by setting <code>mwheel-coalesce-scroll-events</code> to <code>nil</code>.</p>
<p>Unfortunately, this affects all wheel events, while <code>pixel-scroll-precision-mode</code>
cares only about vertical scrolling. Other wheel-based features go crazy (for
instance, scaling text with a mouse wheel is roughly 20 times faster on my setup,
quite inconvenient). </p>
<p>My tentative fix is to switch coalescing on and off based on the action.</p>
<p>For this I defined two helper functions:</p>
<pre><code class="language-elisp">(defun filter-mwheel-always-coalesce (orig &rest args)
"A filter function suitable for :around advices that ensures only
coalesced scroll events reach the advised function."
(if mwheel-coalesce-scroll-events
(apply orig args)
(setq mwheel-coalesce-scroll-events t)))
(defun filter-mwheel-never-coalesce (orig &rest args)
"A filter function suitable for :around advices that ensures only
non-coalesced scroll events reach the advised function."
(if mwheel-coalesce-scroll-events
(setq mwheel-coalesce-scroll-events nil)
(apply orig args)))
</code></pre>
<p>Before forwarding a scroll event, they check whether
<code>mwheel-coalesce-scroll-events</code> matches the expectation, and either forward the
event or change the configuration. </p>
<p>When switching the event is dropped, which seems questionable but is actually
preferable in my experience.</p>
<p>Finally, we can advise the wheel sensitive functions accordingly:</p>
<pre><code class="language-elisp">; Don't coalesce for high precision scrolling
(advice-add 'pixel-scroll-precision :around #'filter-mwheel-never-coalesce)
; Coalesce for default scrolling (which is still used for horizontal scrolling)
; and text scaling (bound to ctrl + mouse wheel by default).
(advice-add 'mwheel-scroll :around #'filter-mwheel-always-coalesce)
(advice-add 'mouse-wheel-text-scale :around #'filter-mwheel-always-coalesce)
</code></pre>
<h3 id="horizontal-scrolling">Horizontal scrolling?</h3>
<p>By default horizontal scrolling is not enabled in Emacs.</p>
<p>To change that:</p>
<pre><code class="language-elisp">(setq mouse-wheel-tilt-scroll t)
</code></pre>
<p>If you like reversed / natural scrolling, also set:</p>
<pre><code class="language-elisp">(setq mouse-wheel-flip-direction t)
</code></pre>https://def.lakaban.net/2023-03-05-high-quality-scrolling-emacs2023-03-05T00:00:00Z2023-03-05T00:00:00ZEven more compact lexer table<h1 id="even-more-compact-lexer-table">Even more compact lexer table</h1>
<p>2022-04-14
15:44:30+09:00</p>
<p>Some time ago, I blogged about the <a href="../2020-05-02-compact-lexer-table-representation">representation of lexer table</a>. This post introduced a common scheme originally described in the Dragon Book together with a practical implementation.</p>
<p>In a footnote, I mentioned that I thought that the pseudo-code of the Dragon Book was wrong:</p>
<ul>
<li>The traditional compacting scheme encodes transitions as a pair of default state and a sparse vector; the default state is the most common target, and transitions that target it are not explicitly represented.</li>
<li>By recursively calling <code>nextState</code>, the code did not interpret the default state as a target but rather as a fallback state: if the transition is not part of the sparse vector, look again for the same transition in the default state.</li>
</ul>
<p>But recently, I worked on a project which exhibits a variant of the problem for which the Dragon Book typo might be beneficial.</p>
<p>Namely:</p>
<ul>
<li>many states have a lot of transitions in common (for most of the alphabet, they have the same target states, only a few cases differ)</li>
<li>lookup performance is not critical</li>
<li>the alphabet and the automaton are quite large</li>
</ul>
<p>Given these constraints, it is worth spending some time on making the table more compact, at the cost of slightly worse lookups.</p>
<p>This opens up an interesting question: how to construct a compact table with this encoding? How to pick good fallback states?</p>
<h2 id="finding-common-transitions-and-fallback-states">Finding common transitions and fallback states</h2>
<p>Let’s assume we have a set of states <span class="maths">Q</span> and an alphabet <span class="maths">\Sigma</span> and a transition function <span class="maths">\delta : Q \times \Sigma \rightarrow Q</span>.</p>
<p>For each state <span class="maths">q</span>, we want to decide whether to represent the function <span class="maths">\delta(q,\\_)</span> using either:</p>
<ul>
<li>a total function <span class="maths">\delta_q : \Sigma \rightarrow Q</span></li>
<li>a pair of a partial function <span class="maths">\delta_q : \Sigma \rightharpoonup Q</span> and a fallback state <span class="maths">f_q : Q</span></li>
</ul>
<p>The original function <span class="maths">\delta</span> can be recovered from this decomposition using:</p>
<p><div class="maths">
\delta(q,a) = \begin{cases}
\delta_q(a) & \text{if } a \in \mathrm{dom}(\delta_q) \\
\delta(f_q,a) & \text{otherwise}
\end{cases}
</div></p>
<p>Let’s define a distance function <span class="maths">d(q_1,q_2)</span> on states:</p>
<p><div class="maths">
d(q_1,q_2) = \lvert \left\{ a \in \Sigma \mid \delta(q_1,a) \neq \delta(q_2,a) \right\} \rvert
</div></p>
<p>The distance function measures the number of transitions in which two states disagree.</p>
<p>We now consider the dense graph on <span class="maths">Q</span> with weights defined by <span class="maths">d</span>. A <a href="https://en.wikipedia.org/wiki/Minimum_spanning_tree">minimum spanning tree</a> of this graph relates each state to one of its closest neighbors, in terms of shared transitions.</p>
<p>Lets pick a root <span class="maths">q_r</span> and call <span class="maths">\mathrm{parent}: Q\rightarrow Q</span> the function that associates a state to its parent in the minimum spanning tree rooted at <span class="maths">q_r</span>.</p>
<p>For each state <span class="maths">q</span> we can now define:</p>
<ul>
<li><p><span class="maths">\delta_q(a) = \delta(q, a)</span>, if <span class="maths">q</span> = <span class="maths">q_r</span> (the transition function of the root is total)</p>
</li>
<li><p><span class="maths">\delta_q(a) = \delta(q,a)</span>, if <span class="maths">\delta(q,a) \neq \delta(\mathrm{parent}(q), a)</span> (the transition function of a child is defined when it differs from the parent)</p>
</li>
<li><p><span class="maths">f_q = \mathrm{parent}(q)</span> (the fallback state is the parent)</p>
</li>
</ul>
<h2 id="picking-a-root-and-splitting-trees">Picking a root and splitting trees</h2>
<p>The method above already maximize sharing between transition functions. However it does not try to do anything to minimize lookup length.</p>
<p>In the worst case, lookup might start from one leaf of the tree and try all nodes on the branch to the root before succeeding. So we would like to minimize the length of the branches.</p>
<p>Various heuristics can be useful to choose the root:</p>
<ul>
<li>A first approximation is to pick a node that is in the middle of a tree. This will minimize the length of the maximum branch (worst lookup will be half the diameter of the tree).</li>
<li>Assuming that all states are equally likely to be looked up, we can also pick a root that minimizes the depth of nodes.</li>
</ul>
<p>Both of these metrics can be computed easily. When successively evaluated along the branches of the tree, they form a parabola: decreasing as we get closer to the optimal node, increasing after. So a simple traversal lets us pick the candidate root.</p>
<p>Finally, after picking a root, we might still be unhappy with the height of the tree. In this case, we can simply split into two trees (at the cost of losing some sharing), and repeat the process on each tree.</p>
<p>These are all greedy heuristics. They give a good but not optimal decomposition. The <a href="https://cs.stackexchange.com/questions/64791/diameter-constrained-minimum-spanning-tree-problem">minimum diameter minimum spanning tree</a> seems to be <a href="https://en.wikipedia.org/wiki/NP-hardness">NP-hard</a> in practice (and some variants <a href="https://en.wikipedia.org/wiki/APX">APX-hard</a>), so I did not spend much time on that (and the results were good enough).</p>
<h2 id="practical-implementation">Practical implementation</h2>
<p>My final implementation uses both a default state and fallback states: the root of the tree has a default state and the children fall back to their parent.</p>
<p>Furthermore, the sharing algorithm is not applied to all states at once. Rather, we group states by their default state (the most common target of their transition function). Then we compute a tree for each group of states with the same default.</p>
<p>On my test, fallback states lead to 60% less transitions than default state alone. And as the vector where much more sparse, the packing heuristic performed better (though I did not measure the overhead precisely). </p>https://def.lakaban.net/2022-04-14-even-more-compact-lexer-table2022-04-14T00:00:00Z2022-04-14T00:00:00ZA typeof operator in OCaml<h1 id="a-typeof-operator-in-ocaml">A typeof operator in OCaml</h1>
<p>2021-06-25
21:26:20+09:00</p>
<p>Let’s say one is implementing a source to source rewriter for OCaml (a preprocessor, like a PPX library) and needs to manipulate the type of an expression. They don’t want to execute the expression, just want to refer to its type, something like a <code>type of <expr></code> operator.</p>
<p>OCaml lets you bind the type of a sub-expression to a variable, e.g. <code>(<expr> : 'my_var)</code>, and you can then refer to <code>'my_var</code> in the rest of the expression. But can we do the same in a module and bind the type to a type constructor?</p>
<p>In this blog post, I will give a syntactic construction to realize the "type of" operator:</p>
<pre><code class="language-ocaml">type t = [%typeof expr]
</code></pre>
<p>like we can already do for module:</p>
<pre><code class="language-ocaml">module type T = module type of M
</code></pre>
<p>Menhir, the parser generator, needs something similar to infer the type of semantic actions and non-terminals. It is useful to improve usability and necessary for the <a href="http://gallium.inria.fr/~fpottier/menhir/manual.pdf#subsection.9.3">inspection features</a> (see ‘Inspection API’). To do so, it runs <code>ocaml</code> a first time in isolation to infer the interface of a specially crafted file, then it parses that interface.
This complicates Menhir and the build process <a href="http://gallium.inria.fr/~fpottier/menhir/manual.pdf#section.14">significantly</a> (see ‘Interaction with build systems’) and makes it less flexible. Could we do the same in a single pass, directly in OCaml code?</p>
<h1 id="invocation-of-cthulhu">Invocation of C(++)thulhu</h1>
<p>It turns out that the following encoding does just that:</p>
<pre><code class="language-ocaml">type my_type = [%type_of <<some_expr>>]
~=
include (
(functor (M : sig module type T module X : T end) -> M.X)
(struct
let some_expr () = <<some_expr>>
module type T0 = sig type my_type end
module X = (val (
(fun (type a) (_ : unit -> a) :
(module T0 with type my_type = a) ->
(module struct type my_type = a end))
some_expr
))
module type T = module type of X
end)
)
</code></pre>
<p><code><<some_expr>></code> range over expressions and <code>my_type</code> over type names: replace them with the actual expression and the name you want.</p>
<p>Let’s go through it layer by layer, in a top-level. For the sake of this example, we will try to infer the type of <code>5</code>, e.g. implementing <code>type my_type = [%type_of 5]</code>.</p>
<p>First we wrap the expression in a function to delay evaluation. This prevents any side effect from happening:</p>
<pre><code class="language-ocaml"># let some_expr () = 5;;
val some_expr : unit -> int
</code></pre>
<p>Type inference is done and the definition almost has the type we want to name.</p>
<p>We will use first-class modules to construct a type declaration. The typechecker requires to name the signatures that are used in first-class modules, so we define <code>T0</code>:</p>
<pre><code class="language-ocaml"># module type T0 = sig type my_type end;;
</code></pre>
<p>But the benefits of first-class modules is that they are actually expressions, which will allow type inference to fill the type information. </p>
<p>The next line is trickier, but look at the answer of <code>ocaml</code>:</p>
<pre><code class="language-ocaml"># module X = (val (
(fun (type a) (_ : unit -> a) :
(module T0 with type my_type = a) ->
(module struct type my_type = a end))
some_expr
));;
module X : sig type my_type = int end
</code></pre>
<p>It seems we are almost done: we have a type definition <code>X.my_type = int</code> in the environment. It was produced by the type checker (we never referred to <code>int</code> ourselves). We could get away with a simple <code>include X</code> tobring <code>type my_int = int</code> in the environment. But that would also leave some garbage names behind (<code>some_expr</code>, <code>T0</code>, <code>X</code>)... It is bad to pollute the environment :).</p>
<p>That being said, let's review the last definition:</p>
<pre><code class="language-ocaml">(fun (type a) (_ : unit -> a) :
(module T0 with type my_type = a) ->
(module struct type my_type = a end))
</code></pre>
<p>This function has type <code>(unit -> 'a) -> (module T0 with type my_type = 'a)</code>. It fills two purposes: extracting the type at the righthand side of the arrow to get rid of the <code>unit</code> we introduced earlier, and producing a first class module with the signature we are looking for.</p>
<p>The <code>with</code> constraint plays an important role. It turns the abstract type <code>my_type</code> of <code>T0</code> to a manifest (<code>type my_type = a</code>). That's key to injecting a type variable in a type constructor.</p>
<p>The functions is then applied to <code>some_expr</code>: unification replaces <code>'a</code> with <code>int</code> and the whole evaluates to a value of type <code>(module T0 with type my_type = int)</code>.</p>
<p>At last, <code>(val (...))</code> "opens the package": it turns back the first-class module, a term, into a module.</p>
<p>Now how do we clean the environment? In an ideal world, we would simply wrap the definition and project <code>X</code>:</p>
<pre><code class="language-ocaml">include struct
let some_expr () = <<some_expr>>
module type T0 = sig type my_type end
module X = (val (
(fun (type a) (_ : unit -> a) :
(module T0 with type my_type = a) ->
(module struct type my_type = a end))
some_expr
))
end.X
</code></pre>
<p>However projecting from a <em>syntactic</em> structure is not allowed in OCaml. It has to be bound to a name to allow projection... Like the argument of a functor! An anonymous functor can do the projection without leaving trace.</p>
<p>The type of this functor is a bit tricky to define. It takes an argument that contains the <code>X</code> we want to project. The implementation could look like: <code>functor (M : sig module X end) -> M.X</code>.</p>
<p>But we are not allowed to define the module <code>X</code> without giving it a type. But the functor doesn't do anything with the contents of <code>X</code>, it just returns it. An abstract module type is therefore sufficient: <code>functor (M : sig module type T module X : T end) -> M.X</code>.</p>
<p>The type of this functor is thus: <code>functor (M : sig module type T module X : T end) -> M.T</code>.</p>
<p>Which <code>T</code> should we pass to the functor? The <code>T0</code> above could do, but <code>my_type</code> is abstract in this signature. The <code>= int</code> would be lost, defeating our purpose. One more layer of module magic saves us: <code>module type T = module type of X</code>. <code>T</code> is exactly the type of <code>X</code>!</p>
<p>Putting everything together, we can construct the tricky structure and immediately project from it. Let's try in a fresh interpreter:</p>
<pre><code class="language-ocaml"># include (
(functor (M : sig module type T module X : T end) -> M.X)
(struct
let some_expr () = 5
module type T0 = sig type my_type end
module X = (val (
(fun (type a) (_ : unit -> a) :
(module T0 with type my_type = a) ->
(module struct type my_type = a end))
some_expr
))
module type T = module type of X
end)
);;
type my_type = int
#
</code></pre>
<p>It produces the type definition we were looking for, nothing more, nothing less :-). Note that the encoding not only does not pollute the global environment, but it also preserves the scope of <code><<some_expr>></code>. The rewriting is eco-friendly and hygienic: <code>T0</code>, <code>X</code>, etc, are not yet visible, there is no risk of name clash.</p>
<h1 id="conclusion">Conclusion</h1>
<p>We just provided a syntactic construction that implements a <code>type_of</code> operator. It is limited to inferring monomorphic types. A polymorphic definition such as <code>[]</code> will lead to an error like:</p>
<pre><code>Error: The type of this packed module contains variables:
(module type_of with type my_type = 'a list)
</code></pre>
<p>which can be slightly improved to:</p>
<pre><code>Error: The type of this packed module contains variables:
(module type'of with type my_type = 'a list)
</code></pre>
<p>Not perfect but quite understandable. With the correct instrumentation, a PPX could report a precise location to the user. Now if only someone could write this PPX :P.</p>
<p>Finally, like Menhir inference trick, this construction easily extends to multiple definitions, e.g.</p>
<pre><code class="language-ocaml">type a = [%type_of foo]
and b = [%type_of bar]
~=
include (
(functor (M : sig module type T module X : T end) -> M.X)
(struct
let some_expr () = (foo), (bar)
module type T0 = sig type a type b end
module X = (val (
(fun (type a b) (_ : unit -> a * b) :
(module T0 with type a = a and type b = b) ->
(module struct type nonrec a = a and b = b end))
some_expr
))
module type T = module type of X
end)
)
</code></pre>https://def.lakaban.net/2021-06-25-inferring-type-declarations-in-ocaml2021-06-25T00:00:00Z2021-06-25T00:00:00ZDDCUTIL: controlling the brightness of an external monitor<h1 id="ddcutil-controlling-the-brightness-of-an-external-monitor">DDCUTIL: controlling the brightness of an external monitor</h1>
<p>2021-06-25
13:16:27+09:00</p>
<p>TL;DR</p>
<ul>
<li>install <a href="https://www.ddcutil.com">ddcutil</a></li>
<li>decrease brightness with <code>sudo ddcutil setvcp 10 - 25</code></li>
<li>increase brightness with <code>sudo ddcutil setvcp 10 + 25</code></li>
<li>to remove the <code>sudo</code>, setup udev rules as suggested by <code>ddcutil</code> documentation, e.g. <code>/usr/share/ddcutil/data/45-ddcutil-i2c.rules</code> on my setup</li>
</ul>
<p>For some time I wondered why external displays can't be controlled from software just like internal ones. I care mostly about free software systems, but the grass doesn't seem greener on gatekeeped platforms.</p>
<p>The good news is that, contrary to what I feared, it's not a hardware limitation. Why is software control for external displays so niche? Maybe the market only cares about laptops and desktops are now an afterthought :-).</p>
<p>Digging a bit on the internet led me first to <a href="https://github.com/kfix/ddcctl">ddcctl</a> for macOS, then quickly to <a href="https://ddcutil.com/">ddcutil</a> for free platforms. <a href="https://en.wikipedia.org/wiki/Display_Data_Channel">DDC</a>, or Display Data Channel, is a protocol to communicate control information with a monitor. When you plug an external monitor on Linux, you might sometime see new i2c devices appearing as<code>/dev/i2c-*</code>. Note that some monitors appear as usb devices, the procedure is essentially the same.</p>
<p><a href="https://en.wikipedia.org/wiki/I%C2%B2C">I2C</a> is a very simple communication bus and one of the <code>/dev/i2c-*</code> devices is a door to connect to your monitor. And <code>ddcutil</code> knows what to tell it. However, read/write access might require some privileges, for a first test, lets use <code>sudo</code>:</p>
<pre><code class="language-bash"># List displays that can be controlled
sudo ddcutil detect
# Decrease brightness
sudo ddcutil setvcp 10 - 25
# Increase brightness
sudo ddcutil setvcp 10 + 25
</code></pre>
<p>In these executions, <code>ddcutil</code> will scan all devices to find a monitor then transmit the command.
The magic number <code>10</code> is the identifier of the brightness property, and the sample commands remove or add 25 to it. More magic on <a href="https://www.ddcutil.com/command_setvcp/">setvcp</a> documentation.</p>
<p>Finally, ddcutil comes with some sample udev rules so that you can use it from a normal account. See <a href="https://www.ddcutil.com/i2c_permissions/">ddcutil i2c permissions</a>, or <code>/usr/share/ddcutil/data/45-ddcutil-i2c.rules</code> on Arch.</p>
<p>I just had to bind the two <code>setvcp</code> commands to convenient keys and now I can control both the embedded and the external display from my keyboard (albeit with some latency for the external one). Thanks to the authors of ddcctl and ddcutil!</p>https://def.lakaban.net/2021-06-25-ddcutil-controlling-the-brightness-of-an-external-monitor2021-06-25T00:00:00Z2021-06-25T00:00:00ZPretty-printing with dominators<h1 id="prettyprinting-with-dominators">Pretty-printing with dominators</h1>
<p>2020-11-14
17:55:44+01:00</p>
<p>A static analysis that I am working on generates complex intermediate data structures. To help debugging it, I wrote a few specialized pretty-printers. But these structures rely a lot on sharing (as in <a href="https://en.wikipedia.org/wiki/Hash_consing">hash consing</a>). The output of pretty-printers would easily blow up in size. To the extent that it was not helping debugging anymore.</p>
<p>Here is a simple example in javascript to see how things can go wrong:</p>
<pre><code class="language-javascript">leaf = "Some tag"
tree = { l: leaf, r: leaf }
tree = { l: tree, r: tree }
tree = { l: tree, r: tree }
console.info(JSON.stringify(tree, null, 2))
</code></pre>
<p>Which outputs:</p>
<pre><code class="language-javascript">{
"l": {
"l": {
"l": "Some tag",
"r": "Some tag"
},
"r": {
"l": "Some tag",
"r": "Some tag"
}
},
"r": {
"l": {
"l": "Some tag",
"r": "Some tag"
},
"r": {
"l": "Some tag",
"r": "Some tag"
}
}
}
</code></pre>
<p>A tree described in <span class="maths">n</span> steps turns into a JSON file of size <span class="maths">O(2^n)</span>. This is a pathological example, but even practical cases can grow enough to make them very hard to read.</p>
<h2 id="postorder-traversal">Post-order traversal</h2>
<p>To help make sense of it, I had to make sharing explicit in the pretty-printed term. A common notation for that is to use "let binders". The example above could be printed as:</p>
<pre><code class="language-javascript">let n0 = "Some tag" in
let n1 = { "l": n0, "r": n0 } in
let n2 = { "l": n1, "r": n1 } in
{ "l": n2, "r": n2 }
</code></pre>
<p>But how do we decide where to introduce them?</p>
<p><a href="https://c9x.me/">c9x</a> suggested a simple solution: introduce the bindings in the post-order traversal of the graph.</p>
<ul>
<li>we visit all nodes, labeling them by their index in the traversal</li>
<li>when we revisit a node, we mark it as shared, to remember to introduce it by a let-binding, and skip its children</li>
<li>we then print all shared nodes using let-binders, ordered by their post-order label.</li>
</ul>
<p>The post-order ensures that children nodes are bound before their parents.</p>
<p><em>Update:</em> <a href="https://github.com/c-cube">@c-cube</a> remarked that this analysis is suitable to export in <a href="https://github.com/leanprover/lean/blob/master/doc/export_format.md">Lean export format</a></p>
<h2 id="binding-at-dominators">Binding at dominators</h2>
<p>A simple post-order traversal would be good enough for a <em>printer</em>, but it is not really <em>pretty</em>. Shared nodes are all bound at the top-level: we recovered a compact notation but it doesn't preserve the locality of nodes. This example exhibits the problem:</p>
<pre><code class="language-javascript">lleaf = "Left tag"
ltree = { l: lleaf, r: lleaf }
ltree = { l: ltree, r: ltree }
rleaf = "Right tag"
rtree = { l: rleaf, r: rleaf }
rtree = { l: rtree, r: rtree }
tree = { l: ltree, r: rtree }
</code></pre>
<p>It gets pretty-printed by the post-order algorithm as:</p>
<pre><code class="language-javascript">let n0 = "Left tag" in
let n1 = { "l": n0, "r": n0 } in
let n2 = "Right tag" in
let n3 = { "l": n2, "r": n2 } in
{ "l": { "l": n1, "r": n1 },
"r": { "l": n3, "r": n3 } }
</code></pre>
<p>Sharing is represented, but it is not that easy to make sense of the structure. It would be much easier to contextualize if related components were close to each other.</p>
<p>Like by printing them in the smallest scope possible... Which, as observed by my friend <a href="https://github.com/trefis">@trefis</a>, would be at the node that dominates them!</p>
<p>If we introduce the let-bindings at the dominating nodes we get:</p>
<pre><code class="language-javascript">{
l: let n0 = "Left tag" in
let n1 = { l: n0, r: n0 } in
{ l: n1, r: n1},
r: let n2 = "Right tag" in
let n3 = { l: n2, r: n2 } in
{ l: n3, r: n3}
}
</code></pre>
<p>Which proves much better in practice.</p>
<h2 id="computing-dominators">Computing dominators</h2>
<p>There are two popular algorithms to compute dominators:</p>
<ul>
<li>The <a href="https://doi.org/10.1145%2F357062.357071">Lengauer-Tarjan</a> algorithm, described in "A fast algorithm for finding dominators in a flowgraph". It has the best known runtime for this problem (<span class="maths">|E|*\alpha(|E|,|V|)</span>, like union-find).</li>
<li>The more recent algorithm proposed by Cooper, Harvey & Kennedy, <a href="http://www.hipersoft.rice.edu/grads/publications/dom14.pdf">"A Simple, Fast Dominance Algorithm"</a>. Its worst case is <span class="maths">O(n^2)</span>, but it is much easier to implement.</li>
</ul>
<p>Furthermore, the simple one is linear for graphs that have a simple <a href="https://en.wikipedia.org/wiki/Control-flow_graph#Loop_connectedness">"loop connectedness"</a>. Because the datastructures I care about are cycle-free, it is indeed a perfect fit.</p>
<p>If they were cyclic it would be worth considering Lengauer-Tarjan to avoid degenerate cases. And to turn some "let" bindings into "let-rec" ones 🙂.</p>
<h1 id="conclusion">Conclusion</h1>
<p>Dominators are useful for pretty-printing directed graphs in a textual form:</p>
<ul>
<li>explicit sharing reduces the size of the output and make it easier to digest,</li>
<li>introducing the binders in the dominators reveals the shape of sharing and cycles.</li>
</ul>
<p>I was not expecting a dominance problem to appear in the middle of a pretty-printing algorithm. This was a pleasant discovery that, in retrospect, seems kind of obvious.</p>
<p><em>Thanks to <a href="https://github.com/Armael">@Armael</a> for some corrections</em></p>https://def.lakaban.net/2020-11-14-pretty-printing-with-dominators2020-11-14T00:00:00Z2020-11-14T00:00:00ZNottui & Lwd at ML Workshop 2020<h1 id="nottui--lwd-at-ml-workshop-2020">Nottui & Lwd at ML Workshop 2020</h1>
<p>2020-09-06
13:40:47+02:00</p>
<p>Last week, the ML & OCaml workshops were held as part of ICFP 2020.</p>
<p>There I presented "Nottui & Lwd - A friendly toolkit for the ML programmer".</p>
<p>Nottui builds on top of Notty to make user interfaces in the terminal.
Lwd is an abstraction for making "interactive documents", a limited form of reactivity that proved suitable as an alternative to the "DOM" (without diffing).</p>
<p>Links:</p>
<ul>
<li><p><a href="https://github.com/let-def/lwd">Github repository</a></p>
</li>
<li><p>the recording is available on <a href="https://www.youtube.com/watch?v=w7jc35kgBZE">Youtube</a> (<a href="ml2020.mp4">local copy</a>)</p>
</li>
<li><p>the <a href="slides.pdf">slides</a> that were presented</p>
</li>
<li><p>the <a href="proposal/proposal.pdf">proposal</a> that was submitted to OCaml workshop</p>
</li>
<li><p><a href="https://icfp2020workshops-unofficial.zulipchat.com/#narrow/stream/254450-ML-Workshop/topic/Nottui.20.26.20Lwd">Q&A transcription</a></p>
</li>
</ul>
<p>------</p>
<p><img src="citty-2-2.png" alt="Citty is a terminal frontend to OCamllabs continuous integration service" /></p>
<p>Citty is a terminal frontend to OCamllabs continuous integration service. Interface is rendered by Nottui & Lwd.</p>https://def.lakaban.net/2020-09-06-nottui-lwd-at-ml-workshop-20202020-09-06T00:00:00Z2020-09-06T00:00:00ZInuit: textual user interfaces, OCaml workshop 2016<h1 id="inuit-textual-user-interfaces-ocaml-workshop-2016">Inuit: textual user interfaces, OCaml workshop 2016</h1>
<p>2020-09-05
12:00:00+02:00</p>
<p>Inuit is a library I developed a few years ago to introspect the internal state of running applications. At its core is an abstraction representing an interactive text buffer.</p>
<p>While doing some cleanup, I found the <a href="poster/poster.pdf">poster</a> that I submitted at OCaml workshop 4 years ago.</p>
<p>In this <a href="demo.mp4">demo</a> it is used to visualize the signature of an OCaml module with interactive folding.</p>
<p><img src="demo.jpg" alt="Inuit demo" /></p>
<p>The library is no longer developed as I am now focusing on <a href="../2020-09-06-nottui-lwd-at-ml-workshop-2020">Nottui & Lwd</a>.</p>https://def.lakaban.net/2020-09-05-inuit-ocaml-workshop-20162020-09-05T00:00:00Z2020-09-05T00:00:00ZCuite design (1/?): QObject in OCaml<h1 id="cuite-design-1-qobject-in-ocaml">Cuite design (1/?): QObject in OCaml</h1>
<p>2020-05-10
18:39:59+02:00</p>
<p>Two years ago, I worked on <a href="https://gitea.lakaban.net/def/cuite">"Cuite"</a>, an OCaml binding to Qt5. It stalled when I got to the point where all core concepts were mapped to OCaml. The remaining work was very repetitive: go through the huge hierarchy of Qt classes and bind each method, accommodating for the occasional ad-hoc behavior.</p>
<p>There is also some shortcomings to revisit in my approach:</p>
<ul>
<li>Mapping between C++ and OCaml types is quite ad-hoc, there is no principled way to handle all the variations (some types behave likes values, some like references, some exist as part of a graph, some make sense on their own, etc).</li>
<li>The runtime support library relies a lot on internals of OCaml runtime, and would benefit from a cleanup.</li>
<li>The lack of ad-hoc polymorphism means that C++ method invocation has to be very explicit (e.g. <code>foo->setBar(baz)</code> translates to <code>Foo.setBar foo baz</code>). Also, the huge number of methods sometime significantly slows down compilation.</li>
</ul>
<p>This post is the first of a series where I explain the thoughts that went in the design of the library and how these issues are addressed.</p>
<h2 id="exposing-qobjects">Exposing QObjects</h2>
<p>QObject is the root of the main class hierarchy in Qt. It is used everywhere: all widgets are QObject instances.</p>
<p>The binding needs to expose QObject classes, instances and functions to OCaml programs. In this post we will take a look at memory management: how QObjects are allocated and released when manipulated from OCaml.</p>
<p>There are a few properties that I wanted the binding to preserve. This is subjective, another binding might look for other properties. Here is what I was looking for:</p>
<ul>
<li>Runtime safety. Incorrect use of the API should translate to an exception, not to a segmentation fault or memory corruption.</li>
<li>Automatic memory management with opt-out. Most of the time, programmer should not worry about memory management. Occasionally, they might want to make sure memory is released on time. For instance when allocating large objects such as a picture, it is nice to release memory as early as possible.</li>
<li>No arbitrary restriction or ad-hoc rules for objects (unless there is no alternative). Programmers should not worry about cyclic references or have to manage certain objects differently (except maybe for performance reason).</li>
<li>QObjects should interact well with other OCaml features. Physical equality, ordering, and hashing should make sense.</li>
</ul>
<p>I ended up with a scheme that provides all these properties to the binding. The rest of the post focuses on memory management for QObjects.</p>
<h2 id="qobject-values">QObject values</h2>
<p>Each QObject instance visible from the OCaml program is mapped to a unique value. This graph shows all the infrastructure involved.</p>
<p><img src="qobject-repr.svg" alt="Exposing a QObject to OCaml world" /></p>
<p>An instance <code>QObject *obj</code> is made accessible from OCaml code via the <code>mlproxy</code> value. In other words, we want the functions:</p>
<pre><code class="language-C++">value Val_QObject(QObject *obj);
QObject *QObject_val(value v);
</code></pre>
<h3 id="qobjectval-from-value-to-qobject">QObject<em></em>al: from value to QObject</h3>
<p>The OCaml block <code>mlproxy</code> contains a pointer to an object <code>cproxy</code> in the C++ heap. In turn <code>cproxy</code> has a pointer to <code>obj</code>, the <code>QObject</code>.</p>
<p>To get to the <code>QObject</code> from the OCaml value we just need two follow two pointers.</p>
<h3 id="handling-qobject-destruction">Handling QObject destruction</h3>
<p>We need to keep track of when the QObject is deleted: the OCaml value might still be reachable and we don't want to accidentally deferences the QObject past that point.</p>
<p>This is not too difficult, we can either:</p>
<ul>
<li>Use a <code>QPointer<QObject></code> instead of a <code>QObject*</code>: Qt will clear the <code>QPointer</code> on object deletion.</li>
<li>Listen on the <a href="https://doc.qt.io/qt-5/qobject.html#destroyed">destroyed</a> signal of the QObject.</li>
</ul>
<h3 id="from-qobject-to-cproxy">From QObject to CProxy</h3>
<p>The <code>Val_QObject</code> function will be invoked many times, we don't want to create a new proxy each time. The <code>ProxyTable</code> remember the <code>CProxy</code> associated to a <code>QObject</code>. It is a hash-table indexed by object addresses. It is populated by the helper function:</p>
<pre><code class="language-c++">static CProxy *QObject_proxy(QObject *obj);
</code></pre>
<p><code>QObject_proxy</code> starts by looking up the hash-table. If a valid proxy is found, it is returned. Otherwise, the object has not yet been exported to OCaml world. We allocate, initialize, and add a new <code>CProxy</code> to the table. The <code>weakid</code> field is initialized to <code>-1</code>.</p>
<h3 id="from-cproxy-to-value-the-weakid-field">From CProxy to value: the weakid field</h3>
<p>We have a <code>CProxy</code>, but not yet an OCaml value. The <code>weakid</code> field is an index in the <code>WeakTable</code>, a global OCaml table that weakly references <code>MLProxy</code>'s:</p>
<ul>
<li>If the field is not <code>-1</code>, a cell is already allocated. We can look directly in the weak table.</li>
<li>If the field is <code>-1</code>, we allocate and initialize a new <code>MLProxy</code> value that points to the CProxy and index it in the weak table.</li>
</ul>
<p>This is done from a primitive exported by OCaml code that also registers a finalizer to handled the cleanup of unreachable objects:</p>
<pre><code class="language-ocaml">val finalize_and_index : ml_proxy -> int
</code></pre>
<!--In pseudo-code: ```c
value Val_QObject(QObject *obj)
{
CAMLparam0;
CAMLlocal1(result);
cproxy *cproxy = QObject_proxy(obj);
if (cproxy->weakid = -1)
result = weak_cell(cproxy->weakid);
else
{
result = alloc_mlproxy(cproxy);
cproxy->weakid =
caml_callback(initialize_and_allocate_id, result);
}
CAMLreturn(result);
} ```-->
<p>Why go through the hoops of this weak table? Because C++ code needs to access the OCaml values but normal roots are strong references. That would prevent <code>MLProxy</code> values from being collectible by the GC.</p>
<h2 id="qobjectvalvalqobject-">QObject<em>val/Val</em>QObject: ✔️</h2>
<p>We now have both functions:</p>
<pre><code class="language-c++">value Val_QObject(QObject *obj);
QObject *QObject_val(value v);
</code></pre>
<pre><code>
They:
- can convert from value to QObject and from QObject to value
- safely handle explicit QObject deletion
- enable automatic deletion of unreachable objects
</code></pre>https://def.lakaban.net/2020-05-10-cuite-design-1-qobject-in-ocaml2020-05-10T00:00:00Z2020-05-10T00:00:00ZCompact lexer table representation<h1 id="compact-lexer-table-representation">Compact lexer table representation</h1>
<p>2020-05-02
15:31:09+02:00</p>
<p>I found surprisingly few information on the transition table of a lexer generator.</p>
<p>There are plenty of resources on the front-end, such as the very nice <a href="https://www.ccs.neu.edu/home/turon/re-deriv.pdf">Regular-expression derivatives reexamined</a> paper.</p>
<p>However resources on the transition table are much more scarce. Eventually, I found two references: <a href="https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools">The Dragon Book</a>, which explains a clever scheme for packing the table, and <a href="https://github.com/ocaml/ocaml/blob/trunk/lex/compact.ml">OCamllex</a> which implements it[^fn1].</p>
<p><em>Update:</em> <a href="https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29.8713&rep=rep1&type=pdf">Software and Hardware Techniques for Efficient Polymorphic Calls</a> thesis analyse a variant of the technique described in this post to store dispatch tables of object-oriented tables. "Row displacement" proves to be very efficient in a closed world and extends well to multiple inheritance.</p>
<p>[^fn1]: Actually, I believe that the pseudo-code in the Dragon Book is wrong. There should be no recursive call to <code>nextState</code>, instead the default state should be returned directly. This is what OCamllex does.</p>
<h2 id="the-transition-table">The transition table</h2>
<p>The lexer generator frontend produces a deterministic finite automaton (<a href="https://en.wikipedia.org/wiki/Deterministic_finite_automaton">DFA</a>). Transitions are labeled by symbols from the input alphabet (a-z characters in the illustration below). Here is a trivial DFA recognizing the word "hello":</p>
<p><img src="lexer-automaton.gif" alt="Lexer automaton" /></p>
<p>We start from state 0 (the initial state). Then we follow the transitions until:</p>
<ul>
<li><strong>acceptance</strong>: if we reach state 5, the word "hello" has been recognized</li>
<li><strong>rejection</strong>: if we reach state 6, recognition failed</li>
</ul>
<p>The animation below shows the process of recognizing two words:</p>
<ul>
<li>success with "hello" input
<img src="lexer-hello.gif" alt="Accepting word "hello"" /></li>
<li>failure with "hey"
<img src="lexer-hey.gif" alt="Rejecting word "hey"" /></li>
</ul>
<p>We need an efficient way to store and follow these transitions.</p>
<h2 id="naive-representation">Naive representation</h2>
<p>The simplest representation is a matrix indexed by states and characters. In C that looks like:</p>
<pre><code class="language-c">// state_t is the type representing a state
// 256 because we work with 8-bit characters
state_t transition_table[MAX_STATES][256];
state_t next_state(state_t current, uint8_t input)
{
return transition_table[current][input];
}
</code></pre>
<p>This is efficient in time but not in space. The difficulty lies in finding a compact representation that does not compromise speed:</p>
<ul>
<li>Transitions will be followed for every input byte. This is the hottest part of the lexing process.</li>
<li>Practical languages can grow to thousand of states. The matrix take a few megabytes of memory.</li>
</ul>
<p>Here is the matrix for the "hello" example:</p>
<table class="std">
<tr><th>\ </th><th>a...d </th><th>e </th><th>f,g </th><th>h </th><th>i,j,k </th><th>l </th><th>m,n </th><th>o </th><th>p...z </th></tr>
<tr><td>0 </td><td>6 </td><td>6 </td><td>6 </td><td>1 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td></tr>
<tr><td>1 </td><td>6 </td><td>2 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td></tr>
<tr><td>2 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td><td>3 </td><td>6 </td><td>6 </td><td>6 </td></tr>
<tr><td>3 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td><td>4 </td><td>6 </td><td>6 </td><td>6 </td></tr>
<tr><td>4 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td><td>6 </td><td>5 </td><td>6 </td></tr>
</table>
<p>We can see that it is very explicit and very redundant. A transition is very likely to be 6!</p>
<h2 id="sparse-representation">Sparse representation</h2>
<p>The Dragon Book suggests to represent each transition vector (a row of the table above) sparsely:</p>
<ul>
<li><strong>default transition</strong>: remember the most common target destination</li>
<li><strong>non-default transitions</strong>: store only the transitions that differs</li>
</ul>
<h3 id="with-associative-lists">With associative lists</h3>
<p>The sparse vectors can be represented with a default value and an associative list for storing non-default transition.</p>
<p>The table becomes:</p>
<table class="std">
<tr><th></th><th>default </th><th>Transitions </th></tr>
<tr><td>0 </td><td>6 </td><td>(h, 1) </td></tr>
<tr><td>1 </td><td>6 </td><td>(e, 2) </td></tr>
<tr><td>2 </td><td>6 </td><td>(l, 3) </td></tr>
<tr><td>3 </td><td>6 </td><td>(l, 4) </td></tr>
<tr><td>4 </td><td>6 </td><td>(o, 5) </td></tr>
</table>
<p>Much more compact!</p>
<p>But there is a performance problem: for each transition, we have to iterate the list looking for a match. A list can be as big as the size of the alphabet. That would lead to unpredictable and often slow performance – unacceptable.</p>
<h3 id="with-overlapping-vectors">With overlapping vectors</h3>
<p>The Dragon Book comes to the rescue and introduces a clever scheme that retains the performance of array-based lookup with the compactness of sparse vectors. The scheme is as follows:</p>
<ul>
<li>Store all vectors in the same array</li>
<li>Offset them such that only non-default transitions don't overlap</li>
<li>Annotate the non-default transitions with their source state</li>
</ul>
<p>With this mechanism, the automaton looks like:</p>
<ul>
<li><p>State table:</p>
<table class="std">
<tr><th></th><th>Default </th><th>Offset </th></tr>
<tr><td>0 </td><td>6 </td><td>0 </td></tr>
<tr><td>1 </td><td>6 </td><td>0 </td></tr>
<tr><td>2 </td><td>6 </td><td>0 </td></tr>
<tr><td>3 </td><td>6 </td><td>1 </td></tr>
<tr><td>4 </td><td>6 </td><td>0 </td></tr>
</table>
</li>
<li><p>Transition table:</p>
<table class="std">
<tr><th>index </th><th style="text-align: center">0...3 </th><th style="text-align: center">4 </th><th style="text-align: center">5,6 </th><th style="text-align: center">7 </th><th style="text-align: center">8,9,10 </th><th style="text-align: center">11 </th><th style="text-align: center">12 </th><th style="text-align: center">13 </th><th style="text-align: center">14 </th><th style="text-align: center">15..26 </th></tr>
<tr><td>source </td><td style="text-align: center">Ø </td><td style="text-align: center">1 </td><td style="text-align: center">Ø </td><td style="text-align: center">0 </td><td style="text-align: center">Ø </td><td style="text-align: center">2 </td><td style="text-align: center">3 </td><td style="text-align: center">Ø </td><td style="text-align: center">4 </td><td style="text-align: center">Ø </td></tr>
<tr><td>target </td><td style="text-align: center">Ø </td><td style="text-align: center">2 </td><td style="text-align: center">Ø </td><td style="text-align: center">1 </td><td style="text-align: center">Ø </td><td style="text-align: center">3 </td><td style="text-align: center">4 </td><td style="text-align: center">Ø </td><td style="text-align: center">5 </td><td style="text-align: center">Ø </td></tr>
</table>
<p>(Using Ø: any value that does not represent a valid state)</p>
</li>
</ul>
<p>We avoid the waste of the naive matrix by filling the unused cells of sparse vectors with the content of others. And we keep the fast access characteristics of arrays.</p>
<p>Here is the mapping between index and characters at offset 0 and 1:</p>
<table class="std">
<tr><th style="text-align: left">index </th><th style="text-align: center">0 </th><th style="text-align: center">1,2,3 </th><th style="text-align: center">4 </th><th style="text-align: center">5,6 </th><th style="text-align: center">7 </th><th style="text-align: center">8,9,10 </th><th style="text-align: center">11 </th><th style="text-align: center">12 </th><th style="text-align: center">13 </th><th style="text-align: center">14 </th><th style="text-align: center">15..25 </th><th>26 </th></tr>
<tr><td style="text-align: left">at offset 0 </td><td style="text-align: center">a </td><td style="text-align: center">b,c,d </td><td style="text-align: center">e </td><td style="text-align: center">f,g </td><td style="text-align: center">h </td><td style="text-align: center">i,j,k </td><td style="text-align: center">l </td><td style="text-align: center">m </td><td style="text-align: center">n </td><td style="text-align: center">o </td><td style="text-align: center">p..z </td><td></td></tr>
<tr><td style="text-align: left">at offset 1 </td><td style="text-align: center"></td><td style="text-align: center">a,b,c </td><td style="text-align: center">d </td><td style="text-align: center">e,f </td><td style="text-align: center">g </td><td style="text-align: center">h,i,j </td><td style="text-align: center">k </td><td style="text-align: center">l </td><td style="text-align: center">m </td><td style="text-align: center">n </td><td style="text-align: center">o..w </td><td>z </td></tr>
</table>
<p>States 0, 1, 2, and 4, have been given the offset 0. Their non-default transitions never conflict: rather than having a separate vector of 26-elements for each of them, we can overlap all of them in the same vector.</p>
<p>State 3 is more complicated. It cannot be at offset 0: it has a transition on <em>l</em> that would end up at column 11. But this column is already used by state 2. However the column 12, just after, is not used by other states. So we offset the state by 1, shifting the meaning of characters: <em>l</em> at offset 1 maps to column 12. (It coincides with <em>m</em> at offset 0, but no state has a transition on <em>m</em>.)</p>
<p>With offsets, all transitions can fit in a single vector of 27 elements. Each cell is a bit larger because it stores a pair of states (a source and a target).</p>
<p>The implementation is now:</p>
<pre><code class="language-c">typedef struct {
state_t default_;
int offset;
} state_desc;
typedef struct {
state_t source, target;
} transition_t;
state_desc state_table[MAX_STATES];
transition_t transition_table[MAX_TRANSITIONS];
state_t next_state(state_t current, uint8_t input)
{
int index = state_table[current].offset + input;
if (transition_table[index].source == current)
return transition_table[index].target;
else
return state_table[current].default_;
}
</code></pre>
<p>The tables are a bit harder to generate than the naive matrix. How do we find the right offsets? A simple greedy strategy gives good packings:</p>
<ul>
<li>Start from first vector</li>
<li>Try to fit it at offset 0:
<ul>
<li>If there is no overlap, done</li>
<li>If it overlaps, try again at the next offset</li>
</ul>
</li>
<li>Repeat with the next vector, until all vectors are packed</li>
</ul>
<h2 id="engineering-tricks">Engineering tricks</h2>
<p>Algorithmically, this solution is satisfying. I went a bit further to make it more hardware friendly while maintaining a good space/time trade-off. </p>
<p>Something we did not specify above is the size of each type. How many bits for a <code>state_t</code>? OCamllex has hard-coded limits that can be reached on big yet realistic languages. These limits save space but make the lexer less flexible. I wanted more freedom here.</p>
<p>I set myself the goal of storing everything in a single array of 32-bits value. I ended up with 23 bits for offsets. This allows for a theoretical maximum of ~8 million transitions, using up to 32 MiB.</p>
<h3 id="1-disambiguate-using-characters">1. Disambiguate using characters</h3>
<p>Rather than storing a source state in a transition to distinguish non-default from default transition, store an input character: this transition is non-default if we reached it by following this input character. I call it the input <em>disambiguator</em>.</p>
<pre><code class="language-c">typedef struct {
uint8_t input;
state_t target;
} transition_t;
state_t next_state(state_t current, uint8_t input)
{
int index = state_table[current].offset + input;
if (transition_table[index].input == input)
return transition_table[index].target;
else
return state_table[current].default_;
}
</code></pre>
<p>This change alone removes just a few bits of information from a transition cell. And it forces us to store each state at a different offset (otherwise it would be ambiguous). For the "hello" example, offsets are now <em>(0,1,2,3,4)</em>.</p>
<p>But we replaced a vector of states by a vector of characters. There can be many states but there are only 256 characters. We can exploit this in the low-level representation.</p>
<h3 id="2-represent-states-by-their-offsets">2. Represent states by their offsets</h3>
<p>Now that each state has a unique offset we can directly represent them using offsets, rather than consecutive numbers.</p>
<p>We get rid of the <code>offset</code> entry from the state table and store the <code>default_</code> transition as if it was on character "-1". Just before the <code>offset</code>:</p>
<ul>
<li><code>transition_table[offset + c]</code>: transition information from state <code>offset</code> and input character <code>c</code></li>
<li><code>transition_table[offset - 1]</code>: default transition for state <code>offset</code></li>
</ul>
<p>The <code>input</code> <em>disambiguator</em> for <code>transition_table[offset - 1]</code> is chosen to not coincide with the non-default transition of another valid state. In other words <code>offset - 1 - transition_table[offset - 1].input</code> should not be the offset of another state.</p>
<p>Everything fits in a single array now:</p>
<pre><code class="language-c">typedef struct {
uint8_t input;
state_t target;
} transition_t;
transition_t transition_table[MAX_TRANSITIONS];
state_t next_state(state_t current, uint8_t input)
{
if (transition_table[current + input].input == input)
return transition_table[current + input].target;
else
return transition_table[current - 1].target;
}
</code></pre>
<p>By making the state fit in 24-bits, we can represent a transition in a single 32-bit value:</p>
<pre><code class="language-c">typedef int32_t state_t;
typedef struct {
uint8_t input : 8;
state_t target : 24;
} transition_t;
</code></pre>
<h3 id="3-negative-numbers-for-special-actions">3. Negative numbers for special actions</h3>
<p>In the example, states 5 and 6 have a special meaning: accepting or rejecting the input. From the point of view of the automaton they do the same: terminate the analysis and yield control back to the caller. It is the caller that will act differently based on the reason for the termination.</p>
<p>Thus the automaton does not assign any meaning to special transitions other than stopping the analysis. The driver, on the other hand, can have many actions. For instance:</p>
<ul>
<li>backtracking: remember the current state, continue the analysis and if it reaches a rejection state later, fall back to current state and act as if it was accepting</li>
<li>tagging: mark the current state as a "point of interest" for the program, and resume the analysis. This can be used to implement capture groups</li>
</ul>
<p>The special transitions just need to be distinguished from normal states. For this, I simply chose to use negative values, which cannot represent states. This reduces the amount of usable bits in a <code>state_t</code> to 23 (for a maximum table size of 32 MiB).</p>
<h3 id="handling-endoffile">Handling end-of-file</h3>
<p>End-of-file condition (EOF) is reached when there is no more input to feed to the automaton. That can happen at any time, we should always be ready to handle EOF.</p>
<p>Special actions behave like extra states, EOF behave like an extra transition.</p>
<p>OCamllex deals with EOF regularly, by using an alphabet with 257 symbols. I chose to treat EOF differently:</p>
<ul>
<li>To keep using 8-bit integers for "input" disambiguator</li>
<li>EOF is a unique situation, it happens only once per run and it happens last. It does not have to be on the fast path.</li>
</ul>
<p>The remaining degree of freedom we had in the representation of states is the input disambiguator. We use it to encode EOF transition.</p>
<p>We will it to point to any unused transition cell that is now re-purposed to indicate the EOF destination state. The disambiguator of this EOF cell can be anything as long as it is not ambiguous. We end up with a different transition function for EOF:</p>
<pre><code class="language-c">state_t eof_state(state_t current)
{
int idx = transition_table[current - 1].input;
return transition_table[current - 1 - idx].target;
}
</code></pre>
<p>All these optimizations put more pressure on the packing algorithm. But the added freedom can reduce fragmentation in the sparse array, and in practice many states have the same EOF transition:</p>
<ul>
<li>The packing algorithm can share a single EOF cell with many states, improving efficiency.</li>
<li>The original scheme, the one with many tables, have to give different offsets to each state. What seemed at first a drawback of the single table scheme also happens in the original one in practice.</li>
</ul>
<p>There is a last optimization we can do for storing EOF transition. Because EOF happens at the end of the analysis, it only makes sense for EOF transitions to target special actions.</p>
<p>Therefore we can use this to extra bit of information to introduce more sharing on EOF transitions. We can interpret EOF transitions targeting a regular state it as a default transition. And then repeat looking for an EOF transition from this default state.</p>
<pre><code class="language-c">state_t eof_state(state_t current)
{
while (1)
{
int offset = transition_table[current - 1].input;
int eof_index = current - 1 - offset;
state_t target = transition_table[eof_index].target;
if (target <= 0)
return target;
current = transition_table[current - 1].target;
}
}
</code></pre>
<p>This complicates the packing scheme for diminishing returns. I did not bother implementing it.</p>
<h2 id="final-implementation">Final implementation</h2>
<p>Putting everything together, I got this implementation for the core loop of the lexer:</p>
<pre><code class="language-c">typedef int32_t state_t;
typedef uint32_t transition_t;
#define SRC(transition) ((transition) & 0xFF)
#define DST(transition) ((int32_t)(transition) >> 8)
state_t follow(transition_t *table, state_t state,
unsigned char **buf, unsigned char *end)
{
unsigned char *ptr = *buf;
while (ptr < end && state > 0)
{
unsigned char c = *ptr++;
transition_t def = table[state - 1];
transition_t nxt = table[state + c];
state = DST((SRC(nxt) == c) ? nxt : def);
}
*buf = ptr;
return state;
}
state_t follow_eof(transition_t *table, state_t state)
{
int idx = SRC(transition_table[state - 1]);
return DST(transition_table[state - 1 - idx]);
}
</code></pre>
<p>The interpretation function consume as many characters as possible. This reduces the interpretation overhead (the cost of entering and leaving the interpretation function). We want to spend most of the time in the hot loop!</p>
<p>Note that the loop is quite machine-friendly:</p>
<ul>
<li>The two loads can be issued in parallel</li>
<li>State selection compiles to branch-less code</li>
</ul>
<p>The only branching is the check for the exit condition. It is unavoidable but it happens once and is well predicted.</p>
<h1 id="conclusion">Conclusion</h1>
<p>I presented some techniques for storing the transition table of a lexer. The main result is a simple 40-year-old scheme. It is effective and a few adjustments make it perform even better on modern hardware.</p>
<p>I apologize for not having benchmark figures to show... I did not want to spend the time implementing a production grade lexing engine. I was just interested in playing around the full pipeline rather than stopping after the frontend. If I ever need to design a complete lexer, I have a clear picture of what it should look like.</p>
<p>In the future, I plan to tackle some useful extensions like extraction and lookahead (along the lines of <a href="https://re2c.org/2017_trofimovich_tagged_deterministic_finite_automata_with_lookahead.pdf">Tagged Deterministic Finite Automata with Lookahead</a>).</p>
<h2 id="going-further">Going further</h2>
<p>To handle UTF-8 and other character encodings, I came to the conclusion that the best approach was to generate the automaton for a fixed encoding (e.g. a normalized form of UTF-8). With a preprocessing step to convert the input. The automaton would still work on an 8-bit alphabet, possibly simulating a single codepoint with multiple transitions.</p>
<p>Out of curiosity, I tried to represent transitions using various forms of packed intervals on which to do binary search. Basically a sorted sequence: <em>(first codepoint, last codepoint, target state)</em>. This is a cheap way to handle large alphabets. But I did not manage to make it competitive with the sparse representation, even with clever implementations of binary search like on <a href="http://pvk.ca/Blog/2015/11/29/retrospective-on-binary-search-and-on-compression-slash-compilation/">the excellent PVK's blog</a>. That ruled out the approach for me.</p>https://def.lakaban.net/2020-05-02-compact-lexer-table-representation2020-05-02T00:00:00Z2020-05-02T00:00:00ZMettre à jour un firmware NVME<h1 id="mettre-a-jour-un-firmware-nvme">Mettre à jour un firmware NVME</h1>
<p>2020-03-15
06:45:48+01:00</p>
<p>Comment mettre à jour le firmware d'un SSD Toshiba sous Linux.</p>
<p><strong>Update 2 (25/08/2020) :</strong> J'ai trouvé via ce guide pour <a href="https://www.tonymacx86.com/threads/guide-sierra-on-hp-spectre-x360-native-kaby-lake-support.228302/">hackintosh</a> qu'il était possible de formatter le disque NVME pour avoir des secteurs de 512 octets ou de 4 kilo-octets. J'ai fait un reformattage pour avoir des secteur de 4 kilo-octets et depuis les performances sont redevenues normales !
Il suffit de faire :</p>
<pre><code class="language-bash">$ nvme-format --lbaf=1 /dev/nvme0n1
</code></pre>
<p>L'indice indique la taille des secteurs, reportées par <code>smartctl</code> :</p>
<pre><code>$ smartctl -a /dev/nvme0n1
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
1 - 4096 0 0
</code></pre>
<p><strong>ATTENTION</strong>, cela efface l'intégralité du disque ! Il n'est évidemment pas très sûr d'utiliser un SSD qui a déjà émis des signes d'instabilité (en perf pas en perte de données mais bon ...), cependant il se porte comme un charme depuis deux mois que je l'ai reformaté :-).</p>
<p><strong>Update 1 (26/04/2020) :</strong> La mise à jour n'a fait que prolonger de quelques semaines la durée de vie du SSD. Aucune donnée n'a été perdue, ce sursis a au moins laisser le temps de vérifier le disque dur ! Il est néanmoins dangereux de mettre un jour un firmware, qui plus est sur un disque défectueux.</p>
<h4 id="tldr">TLDR:</h4>
<pre><code>$ 7z x Toshiba\ KXG50ZNV256G_KXG50ZNV512G_KXG50ZNV1T02\ 7YXTM_ZPE.exe
$ 7z x XG5-AADA4107-64bit.exe
$ sudo nvme fw-download /dev/nvme0n1 --fw=AADA4107.sig
$ sudo nvme fw-commit /dev/nvme0n1 --slot=0 --action=1
$ reboot
</code></pre>
<p>Le Dell XPS 9360 saccade de plus en plus. Il semble que ce soit dû au SSD.</p>
<p>Changer le scheduler d'I/O de <code>none</code> à <code>mq-deadline</code> puis <code>kyber</code> réduit significativement la latence. Suffisamment pour naviguer le web mais pas pour regarder des vidéos.</p>
<p>Une recherche sur ce modèle de SSD révèle qu'il existe des mises à jours de son firmware : c'est un Toshiba KXG50ZNV256G avec le firmware AADA4102. Un tour sur le site de Dell permet de télécharger le firmware AADA4107 :
</p>
<pre><code>[Mise à jour du micrologiciel de disque SSD Toshiba pour KXG50ZNV256G](https://www.dell.com/support/home/fr/fr/frbsdt1/drivers/driversdetails?driverid=7yxtm)
</code></pre>
<p>Il n'est pas question de Linux sur ce site, heureusement les outils en ligne de commandes <a href="https://github.com/linux-nvme/nvme-cli">nvme-cli</a> vont permettre de mettre à jour le firmware (un peu léger pour un portable vendu avec un premium pour bénéficier de Linux).</p>
<h4 id="obtenir-le-firmware">Obtenir le firmware</h4>
<ol>
<li><p>Télécharger l’exécutable contenant la mise à jour</p>
</li>
<li><p>Extraire le fichier "AADA4107.sig", ça a marché pour moi en ouvrant l’exécutable avec file-roller (le gestionnaire d'archives de gnome), puis en ouvrant le second exécutable contenu dedans.
Vous pourrez vous en sortir aussi avec <code>7z</code>:</p>
<pre><code class="language-shell">$ 7z x Toshiba\ KXG50ZNV256G_KXG50ZNV512G_KXG50ZNV1T02\ 7YXTM_ZPE.exe
$ 7z x XG5-AADA4107-64bit.exe
</code></pre>
</li>
</ol>
<p>Vous devez maintenant avoir le fichier "AADA4107.sig" :</p>
<pre><code class="language-shell">$ stat AADA4107.sig
File: AADA4107.sig
Size: 1601536
</code></pre>
<p>(Toshiba, Dell : dans un monde meilleur, pourquoi ne pas nous laisser directement télécharger ce fichier ?)</p>
<h4 id="charger-le-firmware-dans-le-ssd">Charger le firmware dans le SSD</h4>
<p>Une petite introduction à l'outil <code>nvme-cli</code>:</p>
<ul>
<li><p>tout passe par le binaire <code>nvme</code>, si vous ne l'avez pas encore c'est <code>pacman -S nvme-cli</code> sous ArchLinux.</p>
</li>
<li><p>la sous-commande <code>list</code> devrait décrire le SSD. Ici :</p>
<pre><code>$ nvme list
Node SN Model Namespace Usage Format FW Rev
–––––––––––––––– –––––––––––––––––––– –––––––––––––––––––––––––––––––––––––––– ––––––––– –––––––––––––––––––––––––– –––––––––––––––– ––––––––
/dev/nvme0n1 18MS105XTY5T KXG50ZNV256G NVMe TOSHIBA 256GB 1 256,06 GB / 256,06 GB 512 B + 0 B AADA4107
</code></pre>
</li>
<li><p>la sous-commande <code>fw-log <device></code> décrit le firmware actuel:</p>
<pre><code>$ nvme fw-log /dev/nvme0n1
afi : 0x1
frs1 : 0x3230313441444141 (AADA4102)
frs2 : 0x3230313441444141 (AADA4102)
</code></pre>
</li>
</ul>
<p>La mise à jour se fait en deux étapes : on charge le firmware avec <code>fw-download</code> et on l'active avec <code>fw-commit</code>.</p>
<pre><code>$ nvme fw-download /dev/nvme0n1 --fw=AADA4107.sig
$ nvme fw-commit /dev/nvme0n1 --slot=0 --action=1
</code></pre>
<p>Il n'y a plus qu'à rebooter l'ordinateur et <code>fw-log</code> devrait vous indiquer que la version <code>AADA4107</code> est maintenant installée :</p>
<pre><code>$ nvme fw-log /dev/nvme0n1
Firmware Log for device:nvme0n1
afi : 0x1
frs1 : 0x3730313441444141 (AADA4107)
frs2 : 0x3230313441444141 (AADA4102)
</code></pre>https://def.lakaban.net/2020-03-14-mettre-a-jour-un-firmware-nvme2020-03-15T00:00:00Z2020-03-15T00:00:00Z