[antlr-interest] Lexical error recovery by manual symbol (character) insertion/deletion?
Gavin Lambert
antlr at mirality.co.nz
Fri Feb 15 11:42:46 PST 2008
At 04:34 16/02/2008, Darach Ennis wrote:
>After some trial and error and a little brain-stretching the
>following seems to work:
>
>F: ('0' | '1'..'9' '0'..'9'*)
> (
> { input.LA(1) == '.' && Character.isDigit(input.LA(2))
> }?=> ('.' '0'..'9'+) { _type = F; }
> | { _type = I; }
> )
> ;
First: don't use _type (that's an implementation detail). Use
$type instead.
Second: solutions to this issue have been posted several times
before; a common alternative solution is:
fragment DIGIT: '0'..'9';
fragment NUMBER: DIGIT+;
fragment FLOAT: NUMBER DOT NUMBER;
INT
: NUMBER
( (DOT DIGIT) => DOT NUMBER { $type = FLOAT; } )?
;
(Or you could replace that first NUMBER in the INT rule with ('0'
| '1'..'9' DIGIT*) if you wanted to ensure leading zeros were
invalid.)
The actual contents of the FLOAT rule don't matter, though it's
usually preferable to make it look similar to what it's going to
represent.
FLOAT can actually be put into the tokens section instead, but
only if it has no content (since if it has content it becomes a
top-level rule, which isn't the goal); unfortunately doing this
causes ANTLR to emit a warning at present, which is why the dummy
fragment approach is usually preferred.
>The _type field should be defined in lexer fragment rules so that
>ambiguity such as the above can be resolved without making a rule
>public.
Lexer fragment rules never emit tokens, so $type is completely
meaningless for them. Any type-juggling must be done in the
top-level rule.
More information about the antlr-interest
mailing list