[antlr-interest] Lexical error recovery by manual symbol (character) insertion/deletion?

Fri Feb 15 17:36:59 PST 2008

Hi Gavin,

I think you've analysed this a lot more deeply than I have. Your
responses are being really helpful to increasing my understanding,
so thank you! :)

> I agree.  I find it a bit irritating that I can't say "I'm
> creating this rule just for convenience; it doesn't need a token
> type id".  Although I'd probably be happier with something like
> this:
>
> tokens { FLOAT; }          // imaginary: type id generated but no
> warning
> fragment DIGIT: '0'..'9';  // fragment: no type id generated
> fragment NUMBER: DIGIT+;   // again, no type id
> INT                        // type id generated
>    : NUMBER
>     ( (DOT DIGIT) => DOT NUMBER { $type = FLOAT; } )?
>   ;
>

Yes, agreed. I tried a similar syntax early on as the use of tokens { ... }
for
lexer rules seems fairly natural. However, I used the same syntax a little
differently:

tokens { FLOAT; INT; }
fragment DIGIT: '0'..'9';
fragment NUMBER
  : DIGIT+ => INT // making this explicit is good documentation
    ( (DOT DIGIT) => DOT NUMBER { $type = FLOAT; } )?
  ;

This is a little more self-documenting (to my eyes) at the expense of being
a little
more verbose. Using the rule name's type id in the default case is clever
but there
are cases where it would not work so well:

fragment DIRECTIVE:
  '-' (
        'define' => DEFINE
      | 'include' => INCLUDE
      | 'if' => IF
      | ... etc
    );

In this case the '-' alone has no real meaning. So using the parent rules
type id
could be seen as a clever optimization from a certain perspective.

Regards,

Darach.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080216/21c4143b/attachment.html