[antlr-interest] Re: Antlr noobie, nondeterminism abounds
lgcraymer
lgc at mail1.jpl.nasa.gov
Sat May 8 23:53:36 PDT 2004
You are trying to do too much in the lexer. Consider factoring things
a bit:
--- In antlr-interest at yahoogroups.com, "WesSantee" <jws01 at t...> wrote:
> Greetings all,
>
> I'm just getting started in antlr (and language recognition in
> general). I've been looking through some of the examples, but after
> trying my hand at my own grammar, I'm getting lexical nondeterminism
> warnings everywhere, and I don't know how to get around them.
>
> Here's an example:
>
> 1) Create a lexer rule representing exactly four decimal digits.
> 2) Create a lexer rule representing any 32-bit unsigned decimal value.
These should be alternatives in a single rule:
num
:
DIGIT (DIGIT (DIGIT)? )?
| DIGIT DIGIT DIGIT DIGIT ( (DIGIT)+ | { _ttype = DIG4; }
;
where DIGIT is protected and DIG4 is defined in the tokens section.
>
> Since #2 is a superset of #1, I get nondeterminism warnings. And no
> wonder. The problem is, I can't think of a way to disambiguate one
> from the other. Bumping k is only going to help in the case where the
> number is >4 digits.
>
> Here's another example:
>
> 1) Create a lexer rule representing ASCII characters 1..7F (call it
> CHAR).
> 2) Create a lexer rule representing ASCII characters 1..FF (call it
> CHAR8).
> 3) Create a lexer rule represeting everything in CHAR *except* '\r'
> and '\n'.
>
> Here again, CHAR8 is a superset of CHAR, and #3 is even crazier.
Disambiguate 1 and 2: CHAR should cover the range 1 .. 7F, while
CHAR8HI should be 80 .. FF; then define
char8
: CHAR | CHAR8HI ;
in the parser (or just use the phrase CHAR | CHAR8HI)
And for 3, extend the factoring--or use something like ~NEWLINE in the
parser.
--Loring
>
> It may be that only people suffering from lexical muddy thinking would
> think to implement rules like this, and that may very well be my
> problem. :) I'm trying to translate an ABNF formal syntax (the IMAP
> protocol, RFC 3501 to be specific) into an antlr grammar, and so far
> straight translation is not working very well.
>
> If you have solutions to the examples above, or recommendations on how
> to cure lexical muddy thinking disease, I'm open to suggestions!
>
> Cheers,
> -Wes
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list