[antlr-interest] Lexer rule for INTEGER and COMMA_INTEGER

Sat Nov 3 08:26:37 PDT 2012

Hi,

I have a lexer grammar that that has to recognize INTEGER like 1234 and
COMMA_INTEGER like 1,234,567
The later integer token has comma in it, and of cause the language has
other places that use comma, e.g. F(1, x) is valid, which contains "1,"
that should be recognized as a INTEGER 1 followd by a comma.

This is similar to the "lexer rule for floating point, integer and range
operator" example given in ANTLR wiki. There the conflict is around period,
here it is comma.

However, I tried the ways suggested by the example, but cannot get it
right. The following is one version of my lexer rules, using semantic
predicate:
    COMMA_INTEGER:(('0'..'9')+ {input.LA(1)==',' && input.LA(2)>='0' &&
input.LA(2)<='9'}?=>(',' ('0'..'9')+)+);
    INTEGER:('0'..'9')+;
This version results in error
    "rule COMMA_INTEGER failed predicate: {input.LA(1)==',' &&
input.LA(2)>='0' && input.LA(2)<='9'}? " for input "1, " as in F(1, x)

The following version uses syntatic predicate
    COMMA_INTEGER:(('0'..'9')+ (',' ('0'..'9')+)=>(','
('0'..'9')+)+);//TODO-COMMA_integer different from RES
    INTEGER:('0'..'9')+;
and results in error
    "required (...)+ loop did not match anything at character ' ' "
 (charactor SPACE)

Swapping the order of INTEGER and COMMA_INTEGER does not changed the
errors.

So it looks like the lexer is predicting next token without running the
predicates, i.e. it goes directly to match COMMA_INTEGER upon seeing a
comma after some digits.

Any suggestion? Thanks!

-- 
Regards,

Yang, Zhaohui