[antlr-interest] beginner questions concerning nondeterminism warnings

Scott Amort jsamort at sympatico.ca
Sat Jan 21 15:02:02 PST 2006


Hello All,

I'm new to ANTLR and compiler building in general, and am trying to
build a scanner (and eventually a parser) for a relatively simple
textual description language to get myself familiar with the ANTLR
program.

Here is a snippet of my .g file:

protected CHAR
  : 'a'..'z'
  | 'A'..'Z'
  ;

protected ALPHA
  : (CHAR) (CHAR)+
  ;

EVENT
  : ('a'..'h') ( "is" | "es" | '&' | '#' | "&&" | "##" )?
  | '_'
  ;

TAG
  : "\\"! ALPHA
  ;

IDENT
  : ALPHA EQUALS!
  ;

EQUALS   : '=';

Essentially, I am dealing with with a language that's needs to
differentiate between events (i.e. letters a through h, with an optional
postfix of 'is', 'es', '#', '##', '&' or '&&'; or the underscore '_',
for example cis, a# or _) and possible variables (i.e. style="bold" or
width=1cm).  The scanner currently recognizes numbers, operators, string
literals and units of measurement fine, but is giving me non-determinism
warnings when I add in the IDENT section.  It works as expected with the
IDENT section commented out (but of course, doesn't correctly scan the
input file).  Here is the warning:

test.g: warning:lexical nondeterminism between rules EVENT and IDENT
upon
test.g:     k==1:'a'..'h'
test.g:     k==2:'e','i'

I think I understand what this means -- with a k=2 lookahead, the
scanner is unable to differentiate between the two rules.

test.g: warning:lexical nondeterminism between rules EVENT and IDENT
upon
test.g:     k==1:'a'..'h'
test.g:     k==2:'e','i'
test.g:     k==3:<end-of-token>,'s'

But, if I change to k=3 lookahead, the situation does not improve, as I
get the above error message (nor does it help with k=4).  However, it
would seem to me that at this point I have looked far enough ahead to
determine that it does or does not match EVENT.  So, I am clearly
misunderstanding something here.  Perhaps someone could lend me a hand?

As well, when I add in some debug lines, I get some very strange results
if I try to run the scanner despite the warnings:

Found equals: style=
Found identifier: style

The equals rule seems to be matching the entire ident string, instead of
just the equals sign.  And it seems to be matching the equals rule
before the ident rule.  Now I am even more confused!

What is the best way to differentiate between matching ALPHA '=' and a
specific sub-set of ALPHA (i.e. EVENT)?  Thanks very much for any help
that can be provided.

Best Regards,
Scott




More information about the antlr-interest mailing list