[antlr-interest] Help with pesky Lexer determinism

Mark Bednarczyk voytechs at yahoo.com
Mon Jun 6 17:24:45 PDT 2005


BTW: while writting the previous email and suggesting to myself
of combining the rules, I did combine them and it worked
perfectly.

So my previous question still stands, as I'd hate deviate from
the more general inherited java.g rule set. Can this be done
more intuitive way?

// a numeric literal
NUM_INT
    {boolean isDecimal=false; Token t=null;}
    :   (NUM_3DIGIT '.' NUM_3DIGIT '.' NUM_3DIGIT '.'
NUM_3DIGIT)=>
        (
            NUM_3DIGIT '.' NUM_3DIGIT '.' NUM_3DIGIT '.'
NUM_3DIGIT
            { _ttype = IP_V4; }
        )
    |   (NUM_HEX_2DIGIT COLON NUM_HEX_2DIGIT COLON)=>
        (
            NUM_HEX_2DIGIT (COLON NUM_HEX_2DIGIT)+
            { _ttype = IP_V6; }
        )
    |   ('a'..'z'|'A'..'Z'|'_'|'$')
('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*
        { _ttype = IDENT; }

Works great.

Cheers,

mark...


>-----Original Message-----
>From: Mark Bednarczyk [mailto:voytechs at yahoo.com]
>Sent: Monday, June 06, 2005 8:16 PM
>To: ANTLR Interest
>Subject: RE: [antlr-interest] Help with pesky Lexer determinism
>
>
>Well I have another problem that is a little more
>involved so maybe I can just get a couple of quick
>pointers. Same issue but now with IPv6 address that
>actually steps of the toes on the IDENT rule since
>IPv6 address is comprised of HEX digits so 'a'..'f'
>overlap with IDENT rule of 'a'..'z'.
>
>BTW: here is the format of IPv6 for those not
>familiar, (HEX HEX COLON (COLON HEX HEX)+) in simple case.
>
>This is what I'm trying to do, but not really sure how
>to code it.
>
>1) Add the IPv6 block to NUM_INT rule with appropriate
>predicate of (NUM_HEX_2DIGIT COLON NUM_HEX_2DIGIT
>COLON) and I do not get any warning from NUM_INT rule.
>
>2) Add propriate predicate to IDENT rele for IPv6
>format (same as #1) and provide an empty condition
>block or tell some how based on the predicate to fail
>the IDENT rule so it will move on and try NUM_INT
>which should succeed.
>
>Somehow I need the IDENT rule to fail on IPv6 address
>while matching on NUM_INT. Almost looks like I need to
>move both rules into a bigger common rule and manually
>set the token type.
>
>Errors I'm getting now:
>    [antlr] ANTLR Parser Generator   Version 2.7.5
>(20050128)   1989-2005 jGuru.com
>    [antlr]
>/home/markbe/prjs/jnetutils-0.1.0/src/antlr/npl/npl.g:
>warning:lexical nondeterminism between rules IDENT and
>NUM_INT upon
>    [antlr]
>/home/markbe/prjs/jnetutils-0.1.0/src/antlr/npl/npl.g:
>    k==1:'A'..'F','a'..'f'
>    [antlr]
>/home/markbe/prjs/jnetutils-0.1.0/src/antlr/npl/npl.g:
>
>k==2:<end-of-token>,'0'..'9','A'..'F','L','X','a'..'f','l','x'
>    [antlr]
>/home/markbe/prjs/jnetutils-0.1.0/src/antlr/npl/npl.g:
>    k==3:<end-of-token>,'0'..'9','A'..'F','L','a'..'f','l'
>    [antlr]
>/home/markbe/prjs/jnetutils-0.1.0/src/antlr/npl/npl.g:
>    k==4:<end-of-token>,'0'..'9','A'..'F','L','a'..'f','l'
>    [antlr] warning: public lexical rule IDENT is
>optional (can match "nothing")
>
>
>And relative portion of the NUM_INT skipping the
>bottom since its not the problem and exactly the same
>as in java.g
>
>IDENT
>options {
>    testLiterals=true;
>}
>    :   (NUM_HEX_2DIGIT COLON NUM_HEX_2DIGIT COLON)=>
>        // EMPTY match
>    |   ('a'..'z'|'A'..'Z'|'_'|'$')
>('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*
>    ;
>
>
>// a numeric literal
>NUM_INT
>    {boolean isDecimal=false; Token t=null;}
>    :   (NUM_3DIGIT '.' NUM_3DIGIT '.' NUM_3DIGIT '.'
>NUM_3DIGIT)=>
>        (
>            NUM_3DIGIT '.' NUM_3DIGIT '.' NUM_3DIGIT
>'.' NUM_3DIGIT
>            { _ttype = IP_V4; }
>        )
>    |   (NUM_HEX_2DIGIT COLON NUM_HEX_2DIGIT COLON)=>
>        (
>            NUM_HEX_2DIGIT (COLON NUM_HEX_2DIGIT)+
>            { _ttype = IP_V6; }
>        )
> < T R U N K A T E D>
>
>protected NUM_HEX_2DIGIT: HEX_DIGIT (HEX_DIGIT)?
>
>




More information about the antlr-interest mailing list