[antlr-interest] Lexical error recovery by manual symbol (character) insertion/deletion?

Fri Feb 15 23:47:45 PST 2008

At 14:36 16/02/2008, Darach Ennis wrote:
>tokens { FLOAT; INT; }
>fragment DIGIT: '0'..'9';
>fragment NUMBER
>   : DIGIT+ => INT // making this explicit is good documentation
>     ( (DOT DIGIT) => DOT NUMBER { $type = FLOAT; } )?
>   ;

The problem with that is that if NUMBER is a fragment it won't 
ever be called by the root Tokens rule, and if it is called by 
another lexer rule it can't then generate a token so it's fairly 
pointless :)  You're also needlessly creating a NUMBER rule (and 
then using it inappropriately in the FLOAT branch).

To fully expand the rules I posted, making everything explicit 
(this almost works as is -- you'd probably have to convert the 
FLOAT token to a non-empty fragment rule to avoid a warning 
though):

tokens { FLOAT; }
fragment DIGIT: '0'..'9';
INT
   : DIGIT+
     (  /* nothing -- it's an INT */
     |  (DOT DIGIT) => DOT DIGIT+ { $type = FLOAT; }
     )
   ;

The "default" output of the rule should be the name of the rule 
itself.  The additional cases just allow for a bit of flexibility 
in case two output tokens have common indefinite-length prefixes.

>fragment DIRECTIVE:
>   '-' (
>         'define' => DEFINE
>       | 'include' => INCLUDE
>       | 'if' => IF
>       | ... etc
>     );

Again, if that's a fragment it won't ever be called and none of 
those would result.  However, what you're trying to do there is 
(usually) fairly straightforward with the current syntax (I say 
"usually" because with certain patterns the lexer can sometimes 
get confused.  But that's being worked on):

tokens {
   DEFINE = '-define';
   INCLUDE = '-include';
   IF = '-if';
   ...
}