Frédéric Bour

Imperative BBT part 2: Binary Search Trees

2023-03-07T00:00:00Z

Imperative BBT part 2: Binary Search Trees

2023-03-07 09:00:00 CET

First we define the type of BST labelled with integers:

struct bst_node {
  struct node node;
  int value;
};

struct bst_node *
bst_left(struct bst_node *node)
{
  return (struct bst_node *)node->node.left;
}

struct bst_node *
bst_right(struct bst_node *node)
{
  return (struct bst_node *)node->node.right;
}

struct bst_node *
bst_make(struct bst_node *self, struct bst_node *left, struct bst_node *right)
{
  return (struct bst_node*)make_tree(&self->node, &left->node, &right->node);
}

There are mostly wrappers over the struct node type.

Insertion

Insertion of a node in a BST is simple: if the tree is empty, we simply return the inserted node. Otherwise, we check if we should insert in the left or the right sub-trees by comparing the label.

Like before, the node to insert has to be provided by the caller, so that this code does not deal with allocation.

struct bst_node *
bst_insert(struct bst_node *new_node, struct bst_node *tree)
{
  if (!tree)
    return bst_make(new_node, NULL, NULL);
  
  struct bst_node *left = bst_left(tree);
  struct bst_node *right = bst_right(tree);
  
  if (new_node->value < tree->value)
    left = bst_insert(new_node, left);
  else
    right = bst_insert(new_node, right);
  
  return bst_make(tree, left, right);
}

This code is literally the same as insertion in a plain BST. However, since bst_make uses make_tree under-the-hood, we get a balanced tree without any effort!

Deletion

The story is almost the same for deletion: we implement the usual BST deletion algorithm. The deleted node, if any, is returned to the caller for memory management purposes.

struct bst_node *
bst_delete(struct bst_node *tree, int value, struct bst_node **deleted)
{
  if (!tree)
  {
    *deleted = NULL;
    return NULL;
  }
  
  struct bst_node *left = bst_left(tree);
  struct bst_node *right = bst_right(tree);
  
  if (value == tree->value)
  {
    *deleted = tree;
    return bst_join(left, right);
  }
  
  if (value < tree->value)
    left = bst_delete(left, value, deleted);
  else
    right = bst_delete(right, value, deleted);
  
  return bst_make(tree, left, right);
}

However, we need a new operation: bst_join, that constructs a tree by concatenating, in order, the nodes of two trees. This new helper is necessary because after deleting a node, we still need to rebuild a tree from its two sub-trees. We will look at it now.

The join operation

The join operation isn't specific two BSTs (it doesn't depend on labels), it can be defined on any balanced tree. We can implement it by building upon make_tree, that relieves us from thinking about balancing.

However, it can be useful to look at the height to decide which sub-tree to traverse. By recursing on the deeper one, we minimize the balancing work needed... Though this is a minor optimization.

struct node *
join_tree(struct node *left, struct node *right)
{
  if (!right)
    return left;
  if (!left)
    return right;
    
  if (left->height < right->height)
    return make_tree(right, join_tree(left, right->left), right->right);
  else
    return make_tree(left, left->left, join_tree(left->right, right));
}

And finally, we lift the operation to work on BST:

struct bst_node *
bst_join(struct bst_node *left, struct bst_node *right)
{
  return (struct bst_node*)join_tree(&left->node, &right->node);
}

Result

In this post we saw that the primitive make_tree from the first article makes the implementation of balanced BSTs trivial. The insertion and deletion algorithms are very close to the naive ones, yet we get balancing almost for free.

Imperative Balanced Binary Trees, part 1: Core balancing

2023-03-05T00:00:00Z

Imperative Balanced Binary Trees, part 1: Core balancing

2023-03-05 08:40:00 CET

I used to find balanced trees tedious to deal with in an imperative setting, particularly in C, though I love them in functional languages.

In this series I will describe an implementation strategy that achieves all I could hope for in a C implementation: safe, modular, reasonably fast, independent of an allocation strategy, easy to extend... and reasonably simple.

Balancing is achieved by calling a single helper function. Other common functions, like BST insertion and deletion, can be derived from it.

This idea is not new: in Parallel Ordered Sets Using Join, Blelloch et al use a similar approach as a building block for parallel implementations of binary trees, and afaik the core idea goes back to Functional Pearls Efficient sets—a balancing act by Adams. But these papers are concerned with immutable sets, and this blog post is about adapting the method to C.

The series is split in 4 posts:

this post implements balancing and exposes the modular interface
in part 2, we will implement binary search trees (BST)
in part 3, we show how to safely maintain additional structural invariants
in part 4, we extend the approach to other balancing criteria

Height-balanced trees

We start by defining a simple flavor of height-balanced trees. Here is a node:

struct node {
  struct node *left, *right;
  int height;
};

A node is represented by a struct node *. NULL denotes the empty tree. A non-empty node has two subtrees and keeps track of its height.

Next, we implement a node_height function to get the height while handling the empty case, and an initialization function that maintains the height invariant:

// Returns the height of node, handling the NULL case
static int node_height(struct node *n)
{
  return n ? n->height : 0;
}

// precondition: self != NULL
// left and right can be NULL to denote empty sub-trees.
// set_node(self, left, right) setups self so that it represents
// a tree with left and right subtrees and the correct height,
// and returns self.
static struct node *set_node(struct node *self,
                             struct node *left, struct node *right)
{
  self->left = left;
  self->right = right;
  int lh = node_height(left);
  int rh = node_height(right);
  self->height = 1 + ((lh > rh) ? lh : rh);
  return self;
}

No other function should directly mutate a struct node, so we can be confident the height is always set correctly.

Constructing balanced nodes

We will use the height field to enforce the same criterion as the one maintained by AVL trees: a tree is balanced if (1) its two sub-trees are balanced, (2) their heights differ by at most one.

// precondition: max_height >= min_height
// is_balanced(min_height, max_height) returns true if a tree 
// with sub-nodes of these heights is balanced
static bool is_balanced(int min_height, int max_height)
{
  return (max_height - min_height) <= 1;
}

By relying on "smart constructors", the tree rotation functions can be given a functional feeling:

typedef struct node *
make_fun(struct node *self, struct node *left, struct node *right);

static struct node *
rot_left(make_fun make, struct node *self,
         struct node *left, struct node *right)
{
  return make(right, make(self, left, right->left), right->right);
}

static struct node *
rot_right(make_fun make, struct node *self,
          struct node *left, struct node *right)
{
  return make(left, left->left, make(self, left->right, right));
}

Now, the tree balancing functions. To make a balanced node, we check if the two children are already balanced. If yes, not much needs to be done: we directly apply the constructor. If not, we go down the largest sub-tree, and try again.

static struct node *
node_left(struct node *self, struct node *left, struct node *right)
{
  if (is_balanced(node_height(left), node_height(right)))
    return set_node(self, left, right);
  if (right && node_height(right->right) < node_height(right->left))
    right = rot_right(node_left, right, right->left, right->right);
  return rot_left(node_left, self, left, right);
}

static struct node *
node_right(struct node *self, struct node *left, struct node *right)
{
  if (is_balanced(node_height(right), node_height(left)))
    return set_node(self, left, right);
  if (left && node_height(left->left) < node_height(left->right))
    left = rot_left(node_right, left, left->left, left->right);
  return rot_right(node_right, self, left, right);
}

struct node *make_tree(struct node *self, struct node *left, struct node *right)
{
  if (node_height(left) <= node_height(right))
    return node_left(self, left, right);
  else
    return node_right(self, left, right);
}

And that's it for the balancing! To sum up, the two big tasks are:

first, implementing a balancing criterion using a function set_node that maintains some metric used by another function, is_balanced, to check if the criterion is satisfied
then, the make_tree function constructs trees balanced according to the criterion

Note that this code does not need to do any memory management: all functions take the resulting tree as an argument (a style sometime called "destination-passing").

Result

You can download the implementation we reached so far (under MIT license).

balanced.c, balanced.h:

#ifndef BALANCED_H
#define BALANCED_H

struct node {
  struct node *left, *right;
  int height;
};

struct node *
make_tree(struct node *self, struct node *left, struct node *right);

#endif /*!BALANCED_H*/

The interface is remarkably simple, isn't it? In the next post, we will implement binary search trees on top of this interface.

Here is a simple test program (test.c, Makefile), to see how this can be used to make custom balanced datastructures:

#include <stdlib.h>
#include <stdio.h>
#include "balanced.h"

// Always insert at the right end of the tree
struct node *insert_right(struct node *current, struct node *new)
{
    if (!current)
        return new;

    return make_tree(current, current->left, insert_right(current->right, new));
}

// A simple printer, to visually confirm the trees are balanced 
void print_node(int indent, struct node *node)
{
    for (int i = 0; i < indent; ++i)
        fputc(' ', stdout);

    fputs("- ", stdout);

    if (!node)
        fputs("leaf\n", stdout);
    else
    {
        fprintf(stdout, "node, height=%d\n", node->height);
        print_node(indent + 1, node->left);
        print_node(indent + 1, node->right);
    }
}

int main(int argc, char **argv)
{
    struct node nodes[1024];
    struct node *root = NULL;

    for (int i = 0; i < 1024; ++i)
    {
        // Initialize an empty node
        struct node *node = make_tree(&nodes[i], NULL, NULL);
        // Insert it at the right end
        root = insert_right(root, node);
    }
    // Print resulting tree
    print_node(0, root);
    return 0;
}

Or here is a variant to insert in random locations:

struct node *insert_random(int seed, struct node *current, struct node *new)
{
    if (!current)
        return new;

    if ((seed & 3) == 0)
        current->left = insert_random(seed >> 1, current->left, new);
    else
        current->right = insert_random(seed >> 1, current->right, new);
    return make_tree(current, current->left, current->right);
}

Try it by replacing insert_right(root, node) with insert_random(rand(), root, node).

High quality scrolling with Emacs

2023-03-05T00:00:00Z

High quality scrolling with Emacs

2023-03-05 11:30:30 CET

Doom Emacs convinced me to switch to emacs after being a long time vim user. Naturally, I spent a lot of time tweaking my emacs setup 😃 and I settled with Emacs 29.0 using native-compilation (see also gcc-emacs).

On macOS, I use a custom built emacs-plus formula. The specific command line is:

$ brew install emacs-plus@29 --with-xwidgets --with-native-comp

Native comp speeds things up and Xwidgets adds a built-in web browser based on webkit. As for the version, 29 feels much snappier than 28. This is more pronounced on macOS, 28 was OK on Linux. No idea why.

High-Quality Scrolling

One of the addition of Emacs 29 is the pixel-scroll-precision-mode. Just enable it and, if you are using a windowed version of emacs, you should have a vertical scroll that is pixel-based rather than line-based.

This feels much better when using trackpad scrolling:

Download the MP4 video.

Fixing wheel-based horizontal scrolling and text scaling

By default, Emacs regroups multiple scroll events into a single one large enough to scroll one line. This produces much fewer events with a coarse precision. This goes against the smooth experience sought by pixel-scroll-precision-mode, so it disables this feature by setting mwheel-coalesce-scroll-events to nil.

Unfortunately, this affects all wheel events, while pixel-scroll-precision-mode cares only about vertical scrolling. Other wheel-based features go crazy (for instance, scaling text with a mouse wheel is roughly 20 times faster on my setup, quite inconvenient).

My tentative fix is to switch coalescing on and off based on the action.

For this I defined two helper functions:

(defun filter-mwheel-always-coalesce (orig &rest args)
  "A filter function suitable for :around advices that ensures only 
   coalesced scroll events reach the advised function."
  (if mwheel-coalesce-scroll-events
      (apply orig args)
    (setq mwheel-coalesce-scroll-events t)))

(defun filter-mwheel-never-coalesce (orig &rest args)
  "A filter function suitable for :around advices that ensures only 
   non-coalesced scroll events reach the advised function."
  (if mwheel-coalesce-scroll-events
      (setq mwheel-coalesce-scroll-events nil)
    (apply orig args)))

Before forwarding a scroll event, they check whether mwheel-coalesce-scroll-events matches the expectation, and either forward the event or change the configuration.

When switching the event is dropped, which seems questionable but is actually preferable in my experience.

Finally, we can advise the wheel sensitive functions accordingly:

; Don't coalesce for high precision scrolling
(advice-add 'pixel-scroll-precision :around #'filter-mwheel-never-coalesce)

; Coalesce for default scrolling (which is still used for horizontal scrolling)
; and text scaling (bound to ctrl + mouse wheel by default).
(advice-add 'mwheel-scroll          :around #'filter-mwheel-always-coalesce)
(advice-add 'mouse-wheel-text-scale :around #'filter-mwheel-always-coalesce)

Horizontal scrolling?

By default horizontal scrolling is not enabled in Emacs.

To change that:

(setq mouse-wheel-tilt-scroll t)

If you like reversed / natural scrolling, also set:

(setq mouse-wheel-flip-direction t)

Even more compact lexer table

2022-04-14T00:00:00Z

Even more compact lexer table

2022-04-14 15:44:30+09:00

Some time ago, I blogged about the representation of lexer table. This post introduced a common scheme originally described in the Dragon Book together with a practical implementation.

In a footnote, I mentioned that I thought that the pseudo-code of the Dragon Book was wrong:

The traditional compacting scheme encodes transitions as a pair of default state and a sparse vector; the default state is the most common target, and transitions that target it are not explicitly represented.
By recursively calling nextState, the code did not interpret the default state as a target but rather as a fallback state: if the transition is not part of the sparse vector, look again for the same transition in the default state.

But recently, I worked on a project which exhibits a variant of the problem for which the Dragon Book typo might be beneficial.

Namely:

many states have a lot of transitions in common (for most of the alphabet, they have the same target states, only a few cases differ)
lookup performance is not critical
the alphabet and the automaton are quite large

Given these constraints, it is worth spending some time on making the table more compact, at the cost of slightly worse lookups.

This opens up an interesting question: how to construct a compact table with this encoding? How to pick good fallback states?

Finding common transitions and fallback states

Let’s assume we have a set of states Q and an alphabet \Sigma and a transition function \delta : Q \times \Sigma \rightarrow Q.

For each state q, we want to decide whether to represent the function \delta(q,\\_) using either:

a total function \delta_q : \Sigma \rightarrow Q
a pair of a partial function \delta_q : \Sigma \rightharpoonup Q and a fallback state f_q : Q

The original function \delta can be recovered from this decomposition using:

\delta(q,a) = \begin{cases} \delta_q(a) & \text{if } a \in \mathrm{dom}(\delta_q) \\ \delta(f_q,a) & \text{otherwise} \end{cases}

Let’s define a distance function d(q_1,q_2) on states:

d(q_1,q_2) = \lvert \left\{ a \in \Sigma \mid \delta(q_1,a) \neq \delta(q_2,a) \right\} \rvert

The distance function measures the number of transitions in which two states disagree.

We now consider the dense graph on Q with weights defined by d. A minimum spanning tree of this graph relates each state to one of its closest neighbors, in terms of shared transitions.

Lets pick a root q_r and call \mathrm{parent}: Q\rightarrow Q the function that associates a state to its parent in the minimum spanning tree rooted at q_r.

For each state q we can now define:

\delta_q(a) = \delta(q, a), if q = q_r (the transition function of the root is total)
\delta_q(a) = \delta(q,a), if \delta(q,a) \neq \delta(\mathrm{parent}(q), a) (the transition function of a child is defined when it differs from the parent)
f_q = \mathrm{parent}(q) (the fallback state is the parent)

Picking a root and splitting trees

The method above already maximize sharing between transition functions. However it does not try to do anything to minimize lookup length.

In the worst case, lookup might start from one leaf of the tree and try all nodes on the branch to the root before succeeding. So we would like to minimize the length of the branches.

Various heuristics can be useful to choose the root:

A first approximation is to pick a node that is in the middle of a tree. This will minimize the length of the maximum branch (worst lookup will be half the diameter of the tree).
Assuming that all states are equally likely to be looked up, we can also pick a root that minimizes the depth of nodes.

Both of these metrics can be computed easily. When successively evaluated along the branches of the tree, they form a parabola: decreasing as we get closer to the optimal node, increasing after. So a simple traversal lets us pick the candidate root.

Finally, after picking a root, we might still be unhappy with the height of the tree. In this case, we can simply split into two trees (at the cost of losing some sharing), and repeat the process on each tree.

These are all greedy heuristics. They give a good but not optimal decomposition. The minimum diameter minimum spanning tree seems to be NP-hard in practice (and some variants APX-hard), so I did not spend much time on that (and the results were good enough).

Practical implementation

My final implementation uses both a default state and fallback states: the root of the tree has a default state and the children fall back to their parent.

Furthermore, the sharing algorithm is not applied to all states at once. Rather, we group states by their default state (the most common target of their transition function). Then we compute a tree for each group of states with the same default.

On my test, fallback states lead to 60% less transitions than default state alone. And as the vector where much more sparse, the packing heuristic performed better (though I did not measure the overhead precisely).

A typeof operator in OCaml

2021-06-25T00:00:00Z

A typeof operator in OCaml

2021-06-25 21:26:20+09:00

Let’s say one is implementing a source to source rewriter for OCaml (a preprocessor, like a PPX library) and needs to manipulate the type of an expression. They don’t want to execute the expression, just want to refer to its type, something like a type of <expr> operator.

OCaml lets you bind the type of a sub-expression to a variable, e.g. (<expr> : 'my_var), and you can then refer to 'my_var in the rest of the expression. But can we do the same in a module and bind the type to a type constructor?

In this blog post, I will give a syntactic construction to realize the "type of" operator:

type t = [%typeof expr]

like we can already do for module:

module type T = module type of M

Menhir, the parser generator, needs something similar to infer the type of semantic actions and non-terminals. It is useful to improve usability and necessary for the inspection features (see ‘Inspection API’). To do so, it runs ocaml a first time in isolation to infer the interface of a specially crafted file, then it parses that interface. This complicates Menhir and the build process significantly (see ‘Interaction with build systems’) and makes it less flexible. Could we do the same in a single pass, directly in OCaml code?

Invocation of C(++)thulhu

It turns out that the following encoding does just that:

type my_type = [%type_of <<some_expr>>]

~=

include (
  (functor (M : sig module type T module X : T end) -> M.X)
    (struct
      let some_expr () = <<some_expr>>
      module type T0 = sig type my_type end
      module X = (val (
          (fun (type a) (_ : unit -> a) :
             (module T0 with type my_type = a) ->
             (module struct type my_type = a end))
          some_expr
        ))
      module type T = module type of X
    end)
)

<<some_expr>> range over expressions and my_type over type names: replace them with the actual expression and the name you want.

Let’s go through it layer by layer, in a top-level. For the sake of this example, we will try to infer the type of 5, e.g. implementing type my_type = [%type_of 5].

First we wrap the expression in a function to delay evaluation. This prevents any side effect from happening:

# let some_expr () = 5;;
val some_expr : unit -> int

Type inference is done and the definition almost has the type we want to name.

We will use first-class modules to construct a type declaration. The typechecker requires to name the signatures that are used in first-class modules, so we define T0:

# module type T0 = sig type my_type end;;

But the benefits of first-class modules is that they are actually expressions, which will allow type inference to fill the type information.

The next line is trickier, but look at the answer of ocaml:

# module X = (val (
             (fun (type a) (_ : unit -> a) :
                (module T0 with type my_type = a) ->
                (module struct type my_type = a end))
             some_expr
           ));;
module X : sig type my_type = int end

It seems we are almost done: we have a type definition X.my_type = int in the environment. It was produced by the type checker (we never referred to int ourselves). We could get away with a simple include X tobring type my_int = int in the environment. But that would also leave some garbage names behind (some_expr, T0, X)... It is bad to pollute the environment :).

That being said, let's review the last definition:

(fun (type a) (_ : unit -> a) :
   (module T0 with type my_type = a) ->
   (module struct type my_type = a end))

This function has type (unit -> 'a) -> (module T0 with type my_type = 'a). It fills two purposes: extracting the type at the righthand side of the arrow to get rid of the unit we introduced earlier, and producing a first class module with the signature we are looking for.

The with constraint plays an important role. It turns the abstract type my_type of T0 to a manifest (type my_type = a). That's key to injecting a type variable in a type constructor.

The functions is then applied to some_expr: unification replaces 'a with int and the whole evaluates to a value of type (module T0 with type my_type = int).

At last, (val (...)) "opens the package": it turns back the first-class module, a term, into a module.

Now how do we clean the environment? In an ideal world, we would simply wrap the definition and project X:

include struct
         let some_expr () = <<some_expr>>
         module type T0 = sig type my_type end
         module X = (val (
             (fun (type a) (_ : unit -> a) :
                (module T0 with type my_type = a) ->
                (module struct type my_type = a end))
             some_expr
           ))
       end.X

However projecting from a syntactic structure is not allowed in OCaml. It has to be bound to a name to allow projection... Like the argument of a functor! An anonymous functor can do the projection without leaving trace.

The type of this functor is a bit tricky to define. It takes an argument that contains the X we want to project. The implementation could look like: functor (M : sig module X end) -> M.X.

But we are not allowed to define the module X without giving it a type. But the functor doesn't do anything with the contents of X, it just returns it. An abstract module type is therefore sufficient: functor (M : sig module type T module X : T end) -> M.X.

The type of this functor is thus: functor (M : sig module type T module X : T end) -> M.T.

Which T should we pass to the functor? The T0 above could do, but my_type is abstract in this signature. The = int would be lost, defeating our purpose. One more layer of module magic saves us: module type T = module type of X. T is exactly the type of X!

Putting everything together, we can construct the tricky structure and immediately project from it. Let's try in a fresh interpreter:

# include (
     (functor (M : sig module type T module X : T end) -> M.X)
       (struct
         let some_expr () = 5
         module type T0 = sig type my_type end
         module X = (val (
             (fun (type a) (_ : unit -> a) :
                (module T0 with type my_type = a) ->
                (module struct type my_type = a end))
             some_expr
           ))
         module type T = module type of X
       end)
   );;
type my_type = int
#

It produces the type definition we were looking for, nothing more, nothing less :-). Note that the encoding not only does not pollute the global environment, but it also preserves the scope of <<some_expr>>. The rewriting is eco-friendly and hygienic: T0, X, etc, are not yet visible, there is no risk of name clash.

Conclusion

We just provided a syntactic construction that implements a type_of operator. It is limited to inferring monomorphic types. A polymorphic definition such as [] will lead to an error like:

Error: The type of this packed module contains variables:
       (module type_of with type my_type = 'a list)

which can be slightly improved to:

Error: The type of this packed module contains variables:
       (module type'of with type my_type = 'a list)

Not perfect but quite understandable. With the correct instrumentation, a PPX could report a precise location to the user. Now if only someone could write this PPX :P.

Finally, like Menhir inference trick, this construction easily extends to multiple definitions, e.g.

type a = [%type_of foo]
and b = [%type_of bar]

~=

include (
  (functor (M : sig module type T module X : T end) -> M.X)
    (struct
      let some_expr () = (foo), (bar)
      module type T0 = sig type a type b end
      module X = (val (
          (fun (type a b) (_ : unit -> a * b) :
             (module T0 with type a = a and type b = b) ->
             (module struct type nonrec a = a and b = b end))
          some_expr
        ))
      module type T = module type of X
    end)
)

DDCUTIL: controlling the brightness of an external monitor

2021-06-25T00:00:00Z

DDCUTIL: controlling the brightness of an external monitor

2021-06-25 13:16:27+09:00

TL;DR

install ddcutil
decrease brightness with sudo ddcutil setvcp 10 - 25
increase brightness with sudo ddcutil setvcp 10 + 25
to remove the sudo, setup udev rules as suggested by ddcutil documentation, e.g. /usr/share/ddcutil/data/45-ddcutil-i2c.rules on my setup

For some time I wondered why external displays can't be controlled from software just like internal ones. I care mostly about free software systems, but the grass doesn't seem greener on gatekeeped platforms.

The good news is that, contrary to what I feared, it's not a hardware limitation. Why is software control for external displays so niche? Maybe the market only cares about laptops and desktops are now an afterthought :-).

Digging a bit on the internet led me first to ddcctl for macOS, then quickly to ddcutil for free platforms. DDC, or Display Data Channel, is a protocol to communicate control information with a monitor. When you plug an external monitor on Linux, you might sometime see new i2c devices appearing as/dev/i2c-*. Note that some monitors appear as usb devices, the procedure is essentially the same.

I2C is a very simple communication bus and one of the /dev/i2c-* devices is a door to connect to your monitor. And ddcutil knows what to tell it. However, read/write access might require some privileges, for a first test, lets use sudo:

# List displays that can be controlled
sudo ddcutil detect
# Decrease brightness
sudo ddcutil setvcp 10 - 25
# Increase brightness
sudo ddcutil setvcp 10 + 25

In these executions, ddcutil will scan all devices to find a monitor then transmit the command. The magic number 10 is the identifier of the brightness property, and the sample commands remove or add 25 to it. More magic on setvcp documentation.

Finally, ddcutil comes with some sample udev rules so that you can use it from a normal account. See ddcutil i2c permissions, or /usr/share/ddcutil/data/45-ddcutil-i2c.rules on Arch.

I just had to bind the two setvcp commands to convenient keys and now I can control both the embedded and the external display from my keyboard (albeit with some latency for the external one). Thanks to the authors of ddcctl and ddcutil!

Pretty-printing with dominators

2020-11-14T00:00:00Z

Pretty-printing with dominators

2020-11-14 17:55:44+01:00

A static analysis that I am working on generates complex intermediate data structures. To help debugging it, I wrote a few specialized pretty-printers. But these structures rely a lot on sharing (as in hash consing). The output of pretty-printers would easily blow up in size. To the extent that it was not helping debugging anymore.

Here is a simple example in javascript to see how things can go wrong:

leaf = "Some tag"
tree = { l: leaf, r: leaf }
tree = { l: tree, r: tree }
tree = { l: tree, r: tree }
console.info(JSON.stringify(tree, null, 2))

Which outputs:

{
  "l": {
    "l": {
      "l": "Some tag",
      "r": "Some tag"
    },
    "r": {
      "l": "Some tag",
      "r": "Some tag"
    }
  },
  "r": {
    "l": {
      "l": "Some tag",
      "r": "Some tag"
    },
    "r": {
      "l": "Some tag",
      "r": "Some tag"
    }
  }
}

A tree described in n steps turns into a JSON file of size O(2^n). This is a pathological example, but even practical cases can grow enough to make them very hard to read.

Post-order traversal

To help make sense of it, I had to make sharing explicit in the pretty-printed term. A common notation for that is to use "let binders". The example above could be printed as:

let n0 = "Some tag" in
let n1 = { "l": n0, "r": n0 } in
let n2 = { "l": n1, "r": n1 } in
{ "l": n2, "r": n2 }

But how do we decide where to introduce them?

c9x suggested a simple solution: introduce the bindings in the post-order traversal of the graph.

we visit all nodes, labeling them by their index in the traversal
when we revisit a node, we mark it as shared, to remember to introduce it by a let-binding, and skip its children
we then print all shared nodes using let-binders, ordered by their post-order label.

The post-order ensures that children nodes are bound before their parents.

Update: @c-cube remarked that this analysis is suitable to export in Lean export format

Binding at dominators

A simple post-order traversal would be good enough for a printer, but it is not really pretty. Shared nodes are all bound at the top-level: we recovered a compact notation but it doesn't preserve the locality of nodes. This example exhibits the problem:

lleaf = "Left tag"
ltree = { l: lleaf, r: lleaf }
ltree = { l: ltree, r: ltree }
rleaf = "Right tag"
rtree = { l: rleaf, r: rleaf }
rtree = { l: rtree, r: rtree }
tree = { l: ltree, r: rtree }

It gets pretty-printed by the post-order algorithm as:

let n0 = "Left tag" in
let n1 = { "l": n0, "r": n0 } in
let n2 = "Right tag" in
let n3 = { "l": n2, "r": n2 } in
{ "l": { "l": n1, "r": n1 },
  "r": { "l": n3, "r": n3 } }

Sharing is represented, but it is not that easy to make sense of the structure. It would be much easier to contextualize if related components were close to each other.

Like by printing them in the smallest scope possible... Which, as observed by my friend @trefis, would be at the node that dominates them!

If we introduce the let-bindings at the dominating nodes we get:

{
    l: let n0 = "Left tag" in
       let n1 = { l: n0, r: n0 } in
       { l: n1, r: n1},
    r: let n2 = "Right tag" in
       let n3 = { l: n2, r: n2 } in
       { l: n3, r: n3}
}

Which proves much better in practice.

Computing dominators

There are two popular algorithms to compute dominators:

The Lengauer-Tarjan algorithm, described in "A fast algorithm for finding dominators in a flowgraph". It has the best known runtime for this problem (|E|*\alpha(|E|,|V|), like union-find).
The more recent algorithm proposed by Cooper, Harvey & Kennedy, "A Simple, Fast Dominance Algorithm". Its worst case is O(n^2), but it is much easier to implement.

Furthermore, the simple one is linear for graphs that have a simple "loop connectedness". Because the datastructures I care about are cycle-free, it is indeed a perfect fit.

If they were cyclic it would be worth considering Lengauer-Tarjan to avoid degenerate cases. And to turn some "let" bindings into "let-rec" ones 🙂.

Conclusion

Dominators are useful for pretty-printing directed graphs in a textual form:

explicit sharing reduces the size of the output and make it easier to digest,
introducing the binders in the dominators reveals the shape of sharing and cycles.

I was not expecting a dominance problem to appear in the middle of a pretty-printing algorithm. This was a pleasant discovery that, in retrospect, seems kind of obvious.

Thanks to @Armael for some corrections

Nottui & Lwd at ML Workshop 2020

2020-09-06T00:00:00Z

Nottui & Lwd at ML Workshop 2020

2020-09-06 13:40:47+02:00

Last week, the ML & OCaml workshops were held as part of ICFP 2020.

There I presented "Nottui & Lwd - A friendly toolkit for the ML programmer".

Nottui builds on top of Notty to make user interfaces in the terminal. Lwd is an abstraction for making "interactive documents", a limited form of reactivity that proved suitable as an alternative to the "DOM" (without diffing).

Links:

Github repository
the recording is available on Youtube (local copy)
the slides that were presented
the proposal that was submitted to OCaml workshop
Q&A transcription

------

Citty is a terminal frontend to OCamllabs continuous integration service. Interface is rendered by Nottui & Lwd.

Inuit: textual user interfaces, OCaml workshop 2016

2020-09-05T00:00:00Z

Inuit: textual user interfaces, OCaml workshop 2016

2020-09-05 12:00:00+02:00

Inuit is a library I developed a few years ago to introspect the internal state of running applications. At its core is an abstraction representing an interactive text buffer.

While doing some cleanup, I found the poster that I submitted at OCaml workshop 4 years ago.

In this demo it is used to visualize the signature of an OCaml module with interactive folding.

The library is no longer developed as I am now focusing on Nottui & Lwd.

Cuite design (1/?): QObject in OCaml

2020-05-10T00:00:00Z

Cuite design (1/?): QObject in OCaml

2020-05-10 18:39:59+02:00

Two years ago, I worked on "Cuite", an OCaml binding to Qt5. It stalled when I got to the point where all core concepts were mapped to OCaml. The remaining work was very repetitive: go through the huge hierarchy of Qt classes and bind each method, accommodating for the occasional ad-hoc behavior.

There is also some shortcomings to revisit in my approach:

Mapping between C++ and OCaml types is quite ad-hoc, there is no principled way to handle all the variations (some types behave likes values, some like references, some exist as part of a graph, some make sense on their own, etc).
The runtime support library relies a lot on internals of OCaml runtime, and would benefit from a cleanup.
The lack of ad-hoc polymorphism means that C++ method invocation has to be very explicit (e.g. foo->setBar(baz) translates to Foo.setBar foo baz). Also, the huge number of methods sometime significantly slows down compilation.

This post is the first of a series where I explain the thoughts that went in the design of the library and how these issues are addressed.

Exposing QObjects

QObject is the root of the main class hierarchy in Qt. It is used everywhere: all widgets are QObject instances.

The binding needs to expose QObject classes, instances and functions to OCaml programs. In this post we will take a look at memory management: how QObjects are allocated and released when manipulated from OCaml.

There are a few properties that I wanted the binding to preserve. This is subjective, another binding might look for other properties. Here is what I was looking for:

Runtime safety. Incorrect use of the API should translate to an exception, not to a segmentation fault or memory corruption.
Automatic memory management with opt-out. Most of the time, programmer should not worry about memory management. Occasionally, they might want to make sure memory is released on time. For instance when allocating large objects such as a picture, it is nice to release memory as early as possible.
No arbitrary restriction or ad-hoc rules for objects (unless there is no alternative). Programmers should not worry about cyclic references or have to manage certain objects differently (except maybe for performance reason).
QObjects should interact well with other OCaml features. Physical equality, ordering, and hashing should make sense.

I ended up with a scheme that provides all these properties to the binding. The rest of the post focuses on memory management for QObjects.

QObject values

Each QObject instance visible from the OCaml program is mapped to a unique value. This graph shows all the infrastructure involved.

An instance QObject *obj is made accessible from OCaml code via the mlproxy value. In other words, we want the functions:

value Val_QObject(QObject *obj);
QObject *QObject_val(value v);

QObjectal: from value to QObject

The OCaml block mlproxy contains a pointer to an object cproxy in the C++ heap. In turn cproxy has a pointer to obj, the QObject.

To get to the QObject from the OCaml value we just need two follow two pointers.

Handling QObject destruction

We need to keep track of when the QObject is deleted: the OCaml value might still be reachable and we don't want to accidentally deferences the QObject past that point.

This is not too difficult, we can either:

Use a QPointer<QObject> instead of a QObject*: Qt will clear the QPointer on object deletion.
Listen on the destroyed signal of the QObject.

From QObject to CProxy

The Val_QObject function will be invoked many times, we don't want to create a new proxy each time. The ProxyTable remember the CProxy associated to a QObject. It is a hash-table indexed by object addresses. It is populated by the helper function:

static CProxy *QObject_proxy(QObject *obj);

QObject_proxy starts by looking up the hash-table. If a valid proxy is found, it is returned. Otherwise, the object has not yet been exported to OCaml world. We allocate, initialize, and add a new CProxy to the table. The weakid field is initialized to -1.

From CProxy to value: the weakid field

We have a CProxy, but not yet an OCaml value. The weakid field is an index in the WeakTable, a global OCaml table that weakly references MLProxy's:

If the field is not -1, a cell is already allocated. We can look directly in the weak table.
If the field is -1, we allocate and initialize a new MLProxy value that points to the CProxy and index it in the weak table.

This is done from a primitive exported by OCaml code that also registers a finalizer to handled the cleanup of unreachable objects:

val finalize_and_index : ml_proxy -> int

Why go through the hoops of this weak table? Because C++ code needs to access the OCaml values but normal roots are strong references. That would prevent MLProxy values from being collectible by the GC.

QObjectval/ValQObject: ✔️

We now have both functions:

value Val_QObject(QObject *obj);
QObject *QObject_val(value v);


They:
- can convert from value to QObject and from QObject to value
- safely handle explicit QObject deletion
- enable automatic deletion of unreachable objects

Compact lexer table representation

2020-05-02T00:00:00Z

Compact lexer table representation

2020-05-02 15:31:09+02:00

I found surprisingly few information on the transition table of a lexer generator.

There are plenty of resources on the front-end, such as the very nice Regular-expression derivatives reexamined paper.

However resources on the transition table are much more scarce. Eventually, I found two references: The Dragon Book, which explains a clever scheme for packing the table, and OCamllex which implements it[^fn1].

Update: Software and Hardware Techniques for Efficient Polymorphic Calls thesis analyse a variant of the technique described in this post to store dispatch tables of object-oriented tables. "Row displacement" proves to be very efficient in a closed world and extends well to multiple inheritance.

[^fn1]: Actually, I believe that the pseudo-code in the Dragon Book is wrong. There should be no recursive call to nextState, instead the default state should be returned directly. This is what OCamllex does.

The transition table

The lexer generator frontend produces a deterministic finite automaton (DFA). Transitions are labeled by symbols from the input alphabet (a-z characters in the illustration below). Here is a trivial DFA recognizing the word "hello":

We start from state 0 (the initial state). Then we follow the transitions until:

acceptance: if we reach state 5, the word "hello" has been recognized
rejection: if we reach state 6, recognition failed

The animation below shows the process of recognizing two words:

success with "hello" input
failure with "hey"

We need an efficient way to store and follow these transitions.

Naive representation

The simplest representation is a matrix indexed by states and characters. In C that looks like:

// state_t is the type representing a state
// 256 because we work with 8-bit characters
state_t transition_table[MAX_STATES][256];

state_t next_state(state_t current, uint8_t input)
{
    return transition_table[current][input];
}

This is efficient in time but not in space. The difficulty lies in finding a compact representation that does not compromise speed:

Transitions will be followed for every input byte. This is the hottest part of the lexing process.
Practical languages can grow to thousand of states. The matrix take a few megabytes of memory.

Here is the matrix for the "hello" example:

\	a...d	e	f,g	h	i,j,k	l	m,n	o	p...z
0	6	6	6	1	6	6	6	6	6
1	6	2	6	6	6	6	6	6	6
2	6	6	6	6	6	3	6	6	6
3	6	6	6	6	6	4	6	6	6
4	6	6	6	6	6	6	6	5	6

We can see that it is very explicit and very redundant. A transition is very likely to be 6!

Sparse representation

The Dragon Book suggests to represent each transition vector (a row of the table above) sparsely:

default transition: remember the most common target destination
non-default transitions: store only the transitions that differs

With associative lists

The sparse vectors can be represented with a default value and an associative list for storing non-default transition.

The table becomes:

	default	Transitions
0	6	(h, 1)
1	6	(e, 2)
2	6	(l, 3)
3	6	(l, 4)
4	6	(o, 5)

Much more compact!

But there is a performance problem: for each transition, we have to iterate the list looking for a match. A list can be as big as the size of the alphabet. That would lead to unpredictable and often slow performance – unacceptable.

With overlapping vectors

The Dragon Book comes to the rescue and introduces a clever scheme that retains the performance of array-based lookup with the compactness of sparse vectors. The scheme is as follows:

Store all vectors in the same array
Offset them such that only non-default transitions don't overlap
Annotate the non-default transitions with their source state

With this mechanism, the automaton looks like:

State table:

Default Offset

0 6 0

1 6 0

2 6 0

3 6 1

4 6 0
Transition table:

index 0...3 4 5,6 7 8,9,10 11 12 13 14 15..26

source Ø 1 Ø 0 Ø 2 3 Ø 4 Ø

target Ø 2 Ø 1 Ø 3 4 Ø 5 Ø

(Using Ø: any value that does not represent a valid state)

	Default	Offset
0	6	0
1	6	0
2	6	0
3	6	1
4	6	0

index	0...3	4	5,6	7	8,9,10	11	12	13	14	15..26
source	Ø	1	Ø	0	Ø	2	3	Ø	4	Ø
target	Ø	2	Ø	1	Ø	3	4	Ø	5	Ø

We avoid the waste of the naive matrix by filling the unused cells of sparse vectors with the content of others. And we keep the fast access characteristics of arrays.

Here is the mapping between index and characters at offset 0 and 1:

index	0	1,2,3	4	5,6	7	8,9,10	11	12	13	14	15..25	26
at offset 0	a	b,c,d	e	f,g	h	i,j,k	l	m	n	o	p..z
at offset 1		a,b,c	d	e,f	g	h,i,j	k	l	m	n	o..w	z

States 0, 1, 2, and 4, have been given the offset 0. Their non-default transitions never conflict: rather than having a separate vector of 26-elements for each of them, we can overlap all of them in the same vector.

State 3 is more complicated. It cannot be at offset 0: it has a transition on l that would end up at column 11. But this column is already used by state 2. However the column 12, just after, is not used by other states. So we offset the state by 1, shifting the meaning of characters: l at offset 1 maps to column 12. (It coincides with m at offset 0, but no state has a transition on m.)

With offsets, all transitions can fit in a single vector of 27 elements. Each cell is a bit larger because it stores a pair of states (a source and a target).

The implementation is now:

typedef struct {
    state_t default_;
    int offset;
} state_desc;

typedef struct {
    state_t source, target;
} transition_t;

state_desc state_table[MAX_STATES];
transition_t transition_table[MAX_TRANSITIONS];

state_t next_state(state_t current, uint8_t input)
{
    int index = state_table[current].offset + input;
    if (transition_table[index].source == current)
        return transition_table[index].target;
    else 
        return state_table[current].default_;
}

The tables are a bit harder to generate than the naive matrix. How do we find the right offsets? A simple greedy strategy gives good packings:

Start from first vector
Try to fit it at offset 0:
- If there is no overlap, done
- If it overlaps, try again at the next offset
Repeat with the next vector, until all vectors are packed

Engineering tricks

Algorithmically, this solution is satisfying. I went a bit further to make it more hardware friendly while maintaining a good space/time trade-off.

Something we did not specify above is the size of each type. How many bits for a state_t? OCamllex has hard-coded limits that can be reached on big yet realistic languages. These limits save space but make the lexer less flexible. I wanted more freedom here.

I set myself the goal of storing everything in a single array of 32-bits value. I ended up with 23 bits for offsets. This allows for a theoretical maximum of ~8 million transitions, using up to 32 MiB.

1. Disambiguate using characters

Rather than storing a source state in a transition to distinguish non-default from default transition, store an input character: this transition is non-default if we reached it by following this input character. I call it the input disambiguator.

typedef struct {
    uint8_t input;
    state_t target;
} transition_t;

state_t next_state(state_t current, uint8_t input)
{
    int index = state_table[current].offset + input;
    if (transition_table[index].input == input)
        return transition_table[index].target;
    else 
        return state_table[current].default_;
}

This change alone removes just a few bits of information from a transition cell. And it forces us to store each state at a different offset (otherwise it would be ambiguous). For the "hello" example, offsets are now (0,1,2,3,4).

But we replaced a vector of states by a vector of characters. There can be many states but there are only 256 characters. We can exploit this in the low-level representation.

2. Represent states by their offsets

Now that each state has a unique offset we can directly represent them using offsets, rather than consecutive numbers.

We get rid of the offset entry from the state table and store the default_ transition as if it was on character "-1". Just before the offset:

transition_table[offset + c]: transition information from state offset and input character c
transition_table[offset - 1]: default transition for state offset

The input disambiguator for transition_table[offset - 1] is chosen to not coincide with the non-default transition of another valid state. In other words offset - 1 - transition_table[offset - 1].input should not be the offset of another state.

Everything fits in a single array now:

typedef struct {
    uint8_t input;
    state_t target;
} transition_t;

transition_t transition_table[MAX_TRANSITIONS];

state_t next_state(state_t current, uint8_t input)
{
    if (transition_table[current + input].input == input)
        return transition_table[current + input].target;
    else 
        return transition_table[current - 1].target;
}

By making the state fit in 24-bits, we can represent a transition in a single 32-bit value:

typedef int32_t state_t;

typedef struct {
    uint8_t input : 8;
    state_t target : 24;
}  transition_t;

3. Negative numbers for special actions

In the example, states 5 and 6 have a special meaning: accepting or rejecting the input. From the point of view of the automaton they do the same: terminate the analysis and yield control back to the caller. It is the caller that will act differently based on the reason for the termination.

Thus the automaton does not assign any meaning to special transitions other than stopping the analysis. The driver, on the other hand, can have many actions. For instance:

backtracking: remember the current state, continue the analysis and if it reaches a rejection state later, fall back to current state and act as if it was accepting
tagging: mark the current state as a "point of interest" for the program, and resume the analysis. This can be used to implement capture groups

The special transitions just need to be distinguished from normal states. For this, I simply chose to use negative values, which cannot represent states. This reduces the amount of usable bits in a state_t to 23 (for a maximum table size of 32 MiB).

Handling end-of-file

End-of-file condition (EOF) is reached when there is no more input to feed to the automaton. That can happen at any time, we should always be ready to handle EOF.

Special actions behave like extra states, EOF behave like an extra transition.

OCamllex deals with EOF regularly, by using an alphabet with 257 symbols. I chose to treat EOF differently:

To keep using 8-bit integers for "input" disambiguator
EOF is a unique situation, it happens only once per run and it happens last. It does not have to be on the fast path.

The remaining degree of freedom we had in the representation of states is the input disambiguator. We use it to encode EOF transition.

We will it to point to any unused transition cell that is now re-purposed to indicate the EOF destination state. The disambiguator of this EOF cell can be anything as long as it is not ambiguous. We end up with a different transition function for EOF:

state_t eof_state(state_t current)
{
    int idx = transition_table[current - 1].input;
    return transition_table[current - 1 - idx].target;
}

All these optimizations put more pressure on the packing algorithm. But the added freedom can reduce fragmentation in the sparse array, and in practice many states have the same EOF transition:

The packing algorithm can share a single EOF cell with many states, improving efficiency.
The original scheme, the one with many tables, have to give different offsets to each state. What seemed at first a drawback of the single table scheme also happens in the original one in practice.

There is a last optimization we can do for storing EOF transition. Because EOF happens at the end of the analysis, it only makes sense for EOF transitions to target special actions.

Therefore we can use this to extra bit of information to introduce more sharing on EOF transitions. We can interpret EOF transitions targeting a regular state it as a default transition. And then repeat looking for an EOF transition from this default state.

state_t eof_state(state_t current)
{
    while (1)
    {
        int offset = transition_table[current - 1].input;
        int eof_index = current - 1 - offset;
        state_t target = transition_table[eof_index].target;
        if (target <= 0)
          return target;
        current = transition_table[current - 1].target;
    }
}

This complicates the packing scheme for diminishing returns. I did not bother implementing it.

Final implementation

Putting everything together, I got this implementation for the core loop of the lexer:

typedef int32_t state_t;
typedef uint32_t transition_t;
#define SRC(transition) ((transition) & 0xFF)
#define DST(transition) ((int32_t)(transition) >> 8)

state_t follow(transition_t *table, state_t state,
               unsigned char **buf, unsigned char *end)
{
    unsigned char *ptr = *buf;
    while (ptr < end && state > 0)
    {
        unsigned char c = *ptr++;
        transition_t def = table[state - 1];
        transition_t nxt = table[state + c];
        state = DST((SRC(nxt) == c) ? nxt : def);
    }
    *buf = ptr;
    return state;
}

state_t follow_eof(transition_t *table, state_t state)
{
    int idx = SRC(transition_table[state - 1]);
    return DST(transition_table[state - 1 - idx]);
}

The interpretation function consume as many characters as possible. This reduces the interpretation overhead (the cost of entering and leaving the interpretation function). We want to spend most of the time in the hot loop!

Note that the loop is quite machine-friendly:

The two loads can be issued in parallel
State selection compiles to branch-less code

The only branching is the check for the exit condition. It is unavoidable but it happens once and is well predicted.

Conclusion

I presented some techniques for storing the transition table of a lexer. The main result is a simple 40-year-old scheme. It is effective and a few adjustments make it perform even better on modern hardware.

I apologize for not having benchmark figures to show... I did not want to spend the time implementing a production grade lexing engine. I was just interested in playing around the full pipeline rather than stopping after the frontend. If I ever need to design a complete lexer, I have a clear picture of what it should look like.

In the future, I plan to tackle some useful extensions like extraction and lookahead (along the lines of Tagged Deterministic Finite Automata with Lookahead).

Going further

To handle UTF-8 and other character encodings, I came to the conclusion that the best approach was to generate the automaton for a fixed encoding (e.g. a normalized form of UTF-8). With a preprocessing step to convert the input. The automaton would still work on an 8-bit alphabet, possibly simulating a single codepoint with multiple transitions.

Out of curiosity, I tried to represent transitions using various forms of packed intervals on which to do binary search. Basically a sorted sequence: (first codepoint, last codepoint, target state). This is a cheap way to handle large alphabets. But I did not manage to make it competitive with the sparse representation, even with clever implementations of binary search like on the excellent PVK's blog. That ruled out the approach for me.

Mettre à jour un firmware NVME

2020-03-15T00:00:00Z

Mettre à jour un firmware NVME

2020-03-15 06:45:48+01:00

Comment mettre à jour le firmware d'un SSD Toshiba sous Linux.

Update 2 (25/08/2020) : J'ai trouvé via ce guide pour hackintosh qu'il était possible de formatter le disque NVME pour avoir des secteurs de 512 octets ou de 4 kilo-octets. J'ai fait un reformattage pour avoir des secteur de 4 kilo-octets et depuis les performances sont redevenues normales ! Il suffit de faire :

$ nvme-format --lbaf=1 /dev/nvme0n1

L'indice indique la taille des secteurs, reportées par smartctl :

$ smartctl -a /dev/nvme0n1
Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
0 +     512       0         0
1 -    4096       0         0

ATTENTION, cela efface l'intégralité du disque ! Il n'est évidemment pas très sûr d'utiliser un SSD qui a déjà émis des signes d'instabilité (en perf pas en perte de données mais bon ...), cependant il se porte comme un charme depuis deux mois que je l'ai reformaté :-).

Update 1 (26/04/2020) : La mise à jour n'a fait que prolonger de quelques semaines la durée de vie du SSD. Aucune donnée n'a été perdue, ce sursis a au moins laisser le temps de vérifier le disque dur ! Il est néanmoins dangereux de mettre un jour un firmware, qui plus est sur un disque défectueux.

TLDR:

$ 7z x Toshiba\ KXG50ZNV256G_KXG50ZNV512G_KXG50ZNV1T02\ 7YXTM_ZPE.exe
$ 7z x XG5-AADA4107-64bit.exe
$ sudo nvme fw-download /dev/nvme0n1 --fw=AADA4107.sig
$ sudo nvme fw-commit /dev/nvme0n1 --slot=0 --action=1
$ reboot

Le Dell XPS 9360 saccade de plus en plus. Il semble que ce soit dû au SSD.

Changer le scheduler d'I/O de none à mq-deadline puis kyber réduit significativement la latence. Suffisamment pour naviguer le web mais pas pour regarder des vidéos.

Une recherche sur ce modèle de SSD révèle qu'il existe des mises à jours de son firmware : c'est un Toshiba KXG50ZNV256G avec le firmware AADA4102. Un tour sur le site de Dell permet de télécharger le firmware AADA4107 :

[Mise à jour du micrologiciel de disque SSD Toshiba pour KXG50ZNV256G](https://www.dell.com/support/home/fr/fr/frbsdt1/drivers/driversdetails?driverid=7yxtm)

Il n'est pas question de Linux sur ce site, heureusement les outils en ligne de commandes nvme-cli vont permettre de mettre à jour le firmware (un peu léger pour un portable vendu avec un premium pour bénéficier de Linux).

Obtenir le firmware

Télécharger l’exécutable contenant la mise à jour
Extraire le fichier "AADA4107.sig", ça a marché pour moi en ouvrant l’exécutable avec file-roller (le gestionnaire d'archives de gnome), puis en ouvrant le second exécutable contenu dedans. Vous pourrez vous en sortir aussi avec 7z:
```
$ 7z x Toshiba\ KXG50ZNV256G_KXG50ZNV512G_KXG50ZNV1T02\ 7YXTM_ZPE.exe
$ 7z x XG5-AADA4107-64bit.exe
```

Vous devez maintenant avoir le fichier "AADA4107.sig" :

$ stat AADA4107.sig 
  File: AADA4107.sig
  Size: 1601536

(Toshiba, Dell : dans un monde meilleur, pourquoi ne pas nous laisser directement télécharger ce fichier ?)

Charger le firmware dans le SSD

Une petite introduction à l'outil nvme-cli:

tout passe par le binaire nvme, si vous ne l'avez pas encore c'est pacman -S nvme-cli sous ArchLinux.

la sous-commande list devrait décrire le SSD. Ici :

$ nvme list
Node             SN                   Model                                    Namespace Usage                      Format           FW Rev  
–––––––––––––––– –––––––––––––––––––– –––––––––––––––––––––––––––––––––––––––– ––––––––– –––––––––––––––––––––––––– –––––––––––––––– ––––––––
/dev/nvme0n1     18MS105XTY5T         KXG50ZNV256G NVMe TOSHIBA 256GB          1         256,06  GB / 256,06  GB    512   B +  0 B   AADA4107

la sous-commande fw-log <device> décrit le firmware actuel:

$ nvme fw-log /dev/nvme0n1
afi  : 0x1
frs1 : 0x3230313441444141 (AADA4102)
frs2 : 0x3230313441444141 (AADA4102)

La mise à jour se fait en deux étapes : on charge le firmware avec fw-download et on l'active avec fw-commit.

$ nvme fw-download /dev/nvme0n1 --fw=AADA4107.sig
$ nvme fw-commit /dev/nvme0n1 --slot=0 --action=1

Il n'y a plus qu'à rebooter l'ordinateur et fw-log devrait vous indiquer que la version AADA4107 est maintenant installée :

$ nvme fw-log /dev/nvme0n1
Firmware Log for device:nvme0n1
afi  : 0x1
frs1 : 0x3730313441444141 (AADA4107)
frs2 : 0x3230313441444141 (AADA4102)