[antlr-interest] Lexical error recovery by manual symbol (character) insertion/deletion?
Darach Ennis
darach at gmail.com
Fri Feb 15 17:36:59 PST 2008
Hi Gavin,
I think you've analysed this a lot more deeply than I have. Your
responses are being really helpful to increasing my understanding,
so thank you! :)
> I agree. I find it a bit irritating that I can't say "I'm
> creating this rule just for convenience; it doesn't need a token
> type id". Although I'd probably be happier with something like
> this:
>
> tokens { FLOAT; } // imaginary: type id generated but no
> warning
> fragment DIGIT: '0'..'9'; // fragment: no type id generated
> fragment NUMBER: DIGIT+; // again, no type id
> INT // type id generated
> : NUMBER
> ( (DOT DIGIT) => DOT NUMBER { $type = FLOAT; } )?
> ;
>
Yes, agreed. I tried a similar syntax early on as the use of tokens { ... }
for
lexer rules seems fairly natural. However, I used the same syntax a little
differently:
tokens { FLOAT; INT; }
fragment DIGIT: '0'..'9';
fragment NUMBER
: DIGIT+ => INT // making this explicit is good documentation
( (DOT DIGIT) => DOT NUMBER { $type = FLOAT; } )?
;
This is a little more self-documenting (to my eyes) at the expense of being
a little
more verbose. Using the rule name's type id in the default case is clever
but there
are cases where it would not work so well:
fragment DIRECTIVE:
'-' (
'define' => DEFINE
| 'include' => INCLUDE
| 'if' => IF
| ... etc
);
In this case the '-' alone has no real meaning. So using the parent rules
type id
could be seen as a clever optimization from a certain perspective.
Regards,
Darach.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080216/21c4143b/attachment.html
More information about the antlr-interest
mailing list