[antlr-interest] Help with pesky Lexer determinism

Mark Bednarczyk voytechs at yahoo.com
Mon Jun 6 17:15:50 PDT 2005


Well I have another problem that is a little more involved so
maybe I can just get a couple of quick pointers. Same issue but
now with IPv6 address that actually steps of the toes on the
IDENT rule since IPv6 address is comprised of HEX digits so
'a'..'f' overlap with IDENT rule of 'a'..'z'.

BTW: here is the format of IPv6 for those not familiar, (HEX HEX
COLON (COLON HEX HEX)+) in simple case.

This is what I'm trying to do, but not really sure how to code
it.

1) Add the IPv6 block to NUM_INT rule with appropriate predicate
of (NUM_HEX_2DIGIT COLON NUM_HEX_2DIGIT COLON) and I do not get
any warning from NUM_INT rule.

2) Add propriate predicate to IDENT rele for IPv6 format (same
as #1) and provide an empty condition block or tell some how
based on the predicate to fail the IDENT rule so it will move on
and try NUM_INT which should succeed.

Somehow I need the IDENT rule to fail on IPv6 address while
matching on NUM_INT. Almost looks like I need to move both rules
into a bigger common rule and manually set the token type.

Errors I'm getting now:
    [antlr] ANTLR Parser Generator   Version 2.7.5 (20050128)
1989-2005 jGuru.com
    [antlr]
/home/markbe/prjs/jnetutils-0.1.0/src/antlr/npl/npl.g:
warning:lexical nondeterminism between rules IDENT and NUM_INT
upon
    [antlr]
/home/markbe/prjs/jnetutils-0.1.0/src/antlr/npl/npl.g:
k==1:'A'..'F','a'..'f'
    [antlr]
/home/markbe/prjs/jnetutils-0.1.0/src/antlr/npl/npl.g:
k==2:<end-of-token>,'0'..'9','A'..'F','L','X','a'..'f','l','x'
    [antlr]
/home/markbe/prjs/jnetutils-0.1.0/src/antlr/npl/npl.g:
k==3:<end-of-token>,'0'..'9','A'..'F','L','a'..'f','l'
    [antlr]
/home/markbe/prjs/jnetutils-0.1.0/src/antlr/npl/npl.g:
k==4:<end-of-token>,'0'..'9','A'..'F','L','a'..'f','l'
    [antlr] warning: public lexical rule IDENT is optional (can
match "nothing")


And relative portion of the NUM_INT skipping the bottom since
its not the problem and exactly the same as in java.g

IDENT
options {
    testLiterals=true;
}
    :   (NUM_HEX_2DIGIT COLON NUM_HEX_2DIGIT COLON)=>
        // EMPTY match
    |   ('a'..'z'|'A'..'Z'|'_'|'$')
('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*
    ;


// a numeric literal
NUM_INT
    {boolean isDecimal=false; Token t=null;}
    :   (NUM_3DIGIT '.' NUM_3DIGIT '.' NUM_3DIGIT '.'
NUM_3DIGIT)=>
        (
            NUM_3DIGIT '.' NUM_3DIGIT '.' NUM_3DIGIT '.'
NUM_3DIGIT
            { _ttype = IP_V4; }
        )
    |   (NUM_HEX_2DIGIT COLON NUM_HEX_2DIGIT COLON)=>
        (
            NUM_HEX_2DIGIT (COLON NUM_HEX_2DIGIT)+
            { _ttype = IP_V6; }
        )
 < T R U N K A T E D>

protected NUM_HEX_2DIGIT: HEX_DIGIT (HEX_DIGIT)?





More information about the antlr-interest mailing list