[antlr-interest] Re: Antlr noobie, nondeterminism abounds

lgcraymer lgc at mail1.jpl.nasa.gov
Sat May 8 23:53:36 PDT 2004


You are trying to do too much in the lexer.  Consider factoring things
a bit:


--- In antlr-interest at yahoogroups.com, "WesSantee" <jws01 at t...> wrote:
> Greetings all,
> 
> I'm just getting started in antlr (and language recognition in
> general).  I've been looking through some of the examples, but after
> trying my hand at my own grammar, I'm getting lexical nondeterminism
> warnings everywhere, and I don't know how to get around them.
> 
> Here's an example:
> 
> 1) Create a lexer rule representing exactly four decimal digits.
> 2) Create a lexer rule representing any 32-bit unsigned decimal value.

These should be alternatives in a single rule:

num
    :
    DIGIT (DIGIT (DIGIT)? )?
    |  DIGIT DIGIT DIGIT DIGIT ( (DIGIT)+ | { _ttype = DIG4; }
    ;

where DIGIT is protected and DIG4 is defined in the tokens section.

> 
> Since #2 is a superset of #1, I get nondeterminism warnings.  And no
> wonder.  The problem is, I can't think of a way to disambiguate one
> from the other.  Bumping k is only going to help in the case where the
> number is >4 digits.
> 
> Here's another example:
> 
> 1) Create a lexer rule representing ASCII characters 1..7F (call it
> CHAR).
> 2) Create a lexer rule representing ASCII characters 1..FF (call it
> CHAR8).
> 3) Create a lexer rule represeting everything in CHAR *except* '\r'
> and '\n'.
> 
> Here again, CHAR8 is a superset of CHAR, and #3 is even crazier.

Disambiguate 1 and 2:  CHAR should cover the range 1 .. 7F, while
CHAR8HI should be 80 .. FF; then define

char8
   :  CHAR | CHAR8HI ;

in the parser (or just use the phrase CHAR | CHAR8HI)

And for 3, extend the factoring--or use something like ~NEWLINE in the
parser.

--Loring

> 
> It may be that only people suffering from lexical muddy thinking would
> think to implement rules like this, and that may very well be my
> problem. :)  I'm trying to translate an ABNF formal syntax (the IMAP
> protocol, RFC 3501 to be specific) into an antlr grammar, and so far
> straight translation is not working very well.  
> 
> If you have solutions to the examples above, or recommendations on how
> to cure lexical muddy thinking disease, I'm open to suggestions!
> 
> Cheers,
> -Wes



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list