If item \(T\to\bullet n\) is active or reachable
report "Expecting a number"
If item \(T\to(E\bullet)\) is active or reachable
report "A parenthesis has not been closed"
In LRGrep, this is written as two clauses with reduce-filter patterns: \[ \begin{array}{rcccll} [& \_* &/& T\to\bullet n &] & \texttt{ \{ "Expecting a number" \}} \\ [& \_* &/& T\to(E\bullet) &] & \texttt{ \{ "A parenthesis has not been closed" \}} \end{array} \]
\(\begin{array}{ll} \qquad \uparrow & \quad \uparrow \\ \text{Reduction target} & \text{Item filter} \\ \end{array}\)
A dialect of regular expressions denoring sets of LR stacks:
\[ \begin{align*} &&e::=&\ X && \text{— Atom (terminal or non-terminal)} \\ &&|&\ \_ && \text{— Any symbol} \\ &&|&\ e_1;e_2 && \text{— Concatenation} \\ &&|&\ e_1|e_2 && \text{— Disjunction} \\ &&|&\ e^* && \text{— Kleene star} \\ &&|&\ x=e && \text{— Named capture (variable binding)} \\ \\ &&|&\ [e] && \text{— Matching up to reduction} \\ &&|&\ /A\to\beta\bullet\gamma && \text{— Filter for item} \\ \end{align*} \]
We now have the tools to tackle the starting example.
¹ let x = 5;
² let y = 6
³ let z = 7
The ;
changes the interpretation of the next let
from a global definition to a local one.
It is likely the culprit.
We would like to warn the user in the error message:
File "example.ml", line 3, characters 0-3:
3 | let z = 7
^^^
Error: unterminated local let-binding, expecting "in".
Suggestion: remove the ";" at line 1, characters 9-10:
1 | let x = 5;
^
To recognize this situation, we decompose the input into three parts:
let x = 5 | – the prefix that forms a valid definition |
; | – the problematic semicolon |
let y = 6 | – the unterminated let-binding |
Which translates to:
| [structure_item]; semi=SEMI; [let_binding]
{
report "unterminated local let-binding, expecting \"in\".";
hint "remove the \";\" at " $loc(semi)
}
The DSL has many decidable properties.
Besides a compiler, I have implemented tools to help grammar authors write error specifications.
Given a grammar and an invalid sentence, list the reduce-filter patterns by annotating stack trace.
Given a grammar, enumerate enough sentences to cover all reduce-filter patterns (stronger than just covering all states).
Given a grammar and an error specification, check that all failures have an explanation and that all explanations are reachable. Produce counter-examples if not.
The first application, it heavily influenced the design of LRGrep.
Errors fall into two categories.
Cover all cases with reduce-filter patterns. Automatically discovered using enumeration and coverage. |
Refine confusing cases that come from unexpected interactions between grammatical constructions. Use advanced features of the DSL. Reported by OCaml users, specified using the interpreter. |
A language for formalizing tax law.
Error messages are even more important: it targets lawyers and not just developers.
The parser is implemented using Menhir with an exhaustive specification of errors.
Goal: Adapt the Menhir specification to an LRGrep specification.
The menhir specification has 3604 lines and 235 examples.
The LRGrep specification has 589 lines and 177 clauses.
Results:
Elm is a functional programming language for web applications.
The compiler is renowned for the quality of its error messages.
Completely different technology:
Approach:
Results: