|
| 1 | +(* Attributes guide the recovery . |
| 2 | +
|
| 3 | + Some information can be passed to Menhir-recover via attributes. These are |
| 4 | + pieces of string that are ignored by Menhir itself and are transmitted to |
| 5 | + Menhir-recover. |
| 6 | +
|
| 7 | + The attributes that are relevant to Menhir-recover are always prefixed with |
| 8 | + `recover.`. An attribute with the same prefix and that is not understood by |
| 9 | + Menhir-recover will produce a warning message (to detect a typo or a |
| 10 | + misplaced attribute).*) |
| 11 | + |
| 12 | +(** Specification of attributes that are meaningful for recovery*) |
| 13 | +module typeATTRIBUTES=sig |
| 14 | +moduleG :MenhirSdk.Cmly_api.GRAMMAR |
| 15 | + |
| 16 | +(** The Menhir grammar to which these apply*) |
| 17 | + |
| 18 | +(** Recovery cost |
| 19 | +
|
| 20 | + When the parser is in an error state, Menhir-recover will invent some |
| 21 | + input that recovers from this error. In most grammars, this problem has |
| 22 | + many solutions, often an infinity. |
| 23 | +
|
| 24 | + But not all solutions are equally nice. Some will have repetitions, some |
| 25 | + will generate undesirable AST nodes or trigger error reductions... |
| 26 | +
|
| 27 | + To guide this process, a cost can be associated to each symbol (terminal |
| 28 | + or non-terminal), and the cost of the recovery will be the sum of the cost |
| 29 | + of all symbols in the generated sentence.*) |
| 30 | + |
| 31 | +(** Symbol cost |
| 32 | +
|
| 33 | + The `recover.cost` attribute is attached to the definition of symbols |
| 34 | + (terminals and non-terminals) and takes a floating point value. |
| 35 | +
|
| 36 | + %token PLUS [@recover.cost 1.0] |
| 37 | +
|
| 38 | + expr [@recover.cost 1.0]: ... ;*) |
| 39 | + |
| 40 | +valcost_of_symbol :G.symbol ->Cost.t |
| 41 | +(** Cost of a grammar symbol*) |
| 42 | + |
| 43 | +(** Item cost |
| 44 | +
|
| 45 | + The cost can be applied to a specific item (an occurrence of a symbol in a |
| 46 | + rule). |
| 47 | +
|
| 48 | + In this case, the more specific cost will replace the global cost for this |
| 49 | + specific occurrence. |
| 50 | +
|
| 51 | + expr: | INT PLUS [@recover.cost 0.0] INT \{ ... \} | INT TIMES |
| 52 | + [@recover.cost 10.0] INT \{ ... \} ; |
| 53 | +
|
| 54 | + In this example, if an error happens just after an integer in an |
| 55 | + expression, the `PLUS` rule will be favored over the `TIMES` rule because |
| 56 | + the first token is more expensive.*) |
| 57 | + |
| 58 | +valpenalty_of_item :G.production*int ->Cost.t |
| 59 | +(** Penalty (added cost) for shifting an item*) |
| 60 | + |
| 61 | +(** Reduction cost |
| 62 | +
|
| 63 | + The last place where a `recover.cost` is accepted is in a production. This |
| 64 | + is convenient to prevent the recovery to trigger some semantic actions. |
| 65 | +
|
| 66 | + expr: LPAREN expr error \{ ... \} [@recover.cost infinity] ; |
| 67 | +
|
| 68 | + It would not make much sense for the recovery to select an error rule. |
| 69 | + Associating an infinite cost to the production ensures that this never |
| 70 | + happen.*) |
| 71 | + |
| 72 | +valcost_of_prod :G.production ->Cost.t |
| 73 | +(** Cost of reducing a production*) |
| 74 | + |
| 75 | +(** Meaning of costs |
| 76 | +
|
| 77 | + The cost should be a positive floating-point value. +∞ and 0.0 are |
| 78 | + accepted. |
| 79 | +
|
| 80 | + If not specified, the default cost depends on the presence of a semantic |
| 81 | + value: |
| 82 | + - for a terminal without semantic value (such as `%token DOT`) it is 0.0. |
| 83 | + - for a terminal with a semantic value (such as `%token<int> INT`) or a |
| 84 | + non-terminal it is +∞. |
| 85 | +
|
| 86 | + If the attribute happens multiple times, the sum of all occurrences is |
| 87 | + used. |
| 88 | +
|
| 89 | + **TODO**: specify how null values are treated with respect to minimal |
| 90 | + cost, can the algorithm diverge?*) |
| 91 | + |
| 92 | +(** Recovery expressions |
| 93 | +
|
| 94 | + Symbols with a semantic value cannot be picked by the recovery algorithm |
| 95 | + if it does not know how to produce this value. |
| 96 | +
|
| 97 | + The `recover.expr` attribute associates an ocaml expression to a symbol. |
| 98 | + This expression should evaluate to a semantic value for this symbol. |
| 99 | +
|
| 100 | + %token<string> IDENT [@recover.expr "invalid-identifier"] |
| 101 | +
|
| 102 | + When applied to non-terminals, it is particularly useful to produce a |
| 103 | + value that could not be the result of a normal parse. |
| 104 | +
|
| 105 | + expr [@recover.expr Invalid_expression]: ... ; |
| 106 | +
|
| 107 | + Here `Invalid_expression` is a node added to the AST for the purpose of |
| 108 | + identifying parts that were recovered. |
| 109 | +
|
| 110 | + Furthermore, specifying fallback values for non-terminals prevents |
| 111 | + Menhir-recover from generating a hardly predictable sequence of tokens |
| 112 | + just for filling holes in the AST.*) |
| 113 | + |
| 114 | +valdefault_terminal :G.terminal ->stringoption |
| 115 | +(** An optional ocaml expression that should evaluate to a semantic value |
| 116 | + valid for this terminal.*) |
| 117 | + |
| 118 | +valdefault_nonterminal :G.nonterminal ->stringoption |
| 119 | +(** An optional ocaml expression that should evaluate to a semantic value |
| 120 | + valid for this non-terminal.*) |
| 121 | + |
| 122 | +(** The expressions are evaluated every time a new instance of a symbol is |
| 123 | + needed, although it is not specified whether every evaluation will be kept |
| 124 | + in the final solution (at run time, the algorithm is free to explore |
| 125 | + different branches and throw them away as needed). |
| 126 | +
|
| 127 | + **TODO**: decide how information can be communicated with recovery |
| 128 | + expressions (for instance the current location of the parser)*) |
| 129 | + |
| 130 | +(** Recovery prelude |
| 131 | +
|
| 132 | + The `recover.prelude` attribute is attached to the grammar. |
| 133 | +
|
| 134 | + It is an arbitrary piece of OCaml code that will be inserted before the |
| 135 | + code of `recover.expr` expressions. |
| 136 | +
|
| 137 | + It is useful for defining definitions shared by the recovery expressions, |
| 138 | + in the same way as `%\{ ... %\}` is used to share definitions in semantic |
| 139 | + actions of the grammar.*) |
| 140 | + |
| 141 | +valdefault_prelude :Format.formatter ->unit |
| 142 | +(** Output the grammar prelude in this formatter*) |
| 143 | +end |
| 144 | + |
| 145 | +moduleRecover_attributes (G : MenhirSdk.Cmly_api.GRAMMAR) : |
| 146 | +ATTRIBUTESwithmoduleG=G |