[antlr-interest] Lexer rule for INTEGER and COMMA_INTEGER

Sat Nov 3 23:41:56 PDT 2012

Thanks a lot Jim. You suggestions are right to the point. This is the
working version I have for now:

fragment
COMMA_INTEGER:;
INTEGER:('0'..'9')+
  ( ({input.LA(1)==',' && input.LA(2)>='0' && input.LA(2)<='9'}?=> ','
('0'..'9')+)+ {$type = COMMA_INTEGER;}
    | {$type=INTEGER;}
  );

For those who might have confusion about Jim's answer, my explenation is
there are 2 things to keep in mind:
1. You have to use gated semantic predicates ( {...}?=> ) for lexer rules.
Syntatic predicate ( (...)=> ) has no effect (in lexer rules). E.g., in the
above, replace
    {input.LA(1)==',' && input.LA(2)>='0' && input.LA(2)<='9'}?=>
  with
    (',' ('0'..'9'))=>
  will still result in error "required (...)+ loop does not match
anything", as if the sytantic predicate is not there at all.

2. ALTLR generate 'gated predicate code' only for alternatives other than
the last one. So you have to have alternatives to make gated predicates a
real gate. That's why in the above INTEGER rule, there is an explicit empty
alternative. Without this alternative, the predicate will not prevent the
lexer from entering the COMMA_INTEGER alternative when seeing a comma
without a following digit, and trggers an error only after the wrong route
has been taken.
   Well, that's my explenation to Jim's answer "The predicates require that
you cover the positive and negative alts
basically".

2012/11/4 Jim Idle <jimi at temporal-wave.com>

> You will need to use gated semantic predicates I think. Unless you are in
> charge to the language, then you can stop it being so dumb ;)
>
> The predicates require that you cover the positive and negative alts
> basically, or you will get the failed predicate message.
>
> Jim
>
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Zhaohui Yang
> Sent: Saturday, November 03, 2012 11:27 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Lexer rule for INTEGER and COMMA_INTEGER
>
> Hi,
>
> I have a lexer grammar that that has to recognize INTEGER like 1234 and
> COMMA_INTEGER like 1,234,567 The later integer token has comma in it, and
> of cause the language has other places that use comma, e.g. F(1, x) is
> valid, which contains "1,"
> that should be recognized as a INTEGER 1 followd by a comma.
>
> This is similar to the "lexer rule for floating point, integer and range
> operator" example given in ANTLR wiki. There the conflict is around
> period, here it is comma.
>
> However, I tried the ways suggested by the example, but cannot get it
> right. The following is one version of my lexer rules, using semantic
> predicate:
>     COMMA_INTEGER:(('0'..'9')+ {input.LA(1)==',' && input.LA(2)>='0' &&
> input.LA(2)<='9'}?=>(',' ('0'..'9')+)+);
>     INTEGER:('0'..'9')+;
> This version results in error
>     "rule COMMA_INTEGER failed predicate: {input.LA(1)==',' &&
> input.LA(2)>='0' && input.LA(2)<='9'}? " for input "1, " as in F(1, x)
>
> The following version uses syntatic predicate
>     COMMA_INTEGER:(('0'..'9')+ (',' ('0'..'9')+)=>(','
> ('0'..'9')+)+);//TODO-COMMA_integer different from RES
>     INTEGER:('0'..'9')+;
> and results in error
>     "required (...)+ loop did not match anything at character ' ' "
>  (charactor SPACE)
>
> Swapping the order of INTEGER and COMMA_INTEGER does not changed the
> errors.
>
> So it looks like the lexer is predicting next token without running the
> predicates, i.e. it goes directly to match COMMA_INTEGER upon seeing a
> comma after some digits.
>
> Any suggestion? Thanks!
>
> --
> Regards,
>
> Yang, Zhaohui
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

-- 
Regards,

Yang, Zhaohui