[antlr-interest] token precedence by decl order - or tutorial ambiguous
Darren Duncan
darren at darrenduncan.net
Sun Mar 4 21:21:23 PST 2012
Hello,
To briefly introduce myself, I am a new user of ANTLR and intend to use it to
help me develop grammars and parsers for my new database-savvy programming
language Muldis D; https://github.com/muldis/ is where all the related stuff is.
For the main point of this post ...
The tutorial at
http://www.antlr.org/wiki/display/ANTLR3/Quick+Starter+on+Parser+Grammars+-+No+Past+Experience+Required
is useful but I found at least one part of it to be ambiguous and I'm hoping you
can help me clear my understanding, and maybe the tutorial itself can be cleared
up too.
The relevant portion of the tutorial is here between the pair of === lines:
==========
Another point of interest is the order of the token declaration. The earlier a
token is defined, the higher is the precedence if a certain input can be matched
by two or more tokens. This means that using the tokens command to define
keywords will match those keywords instead a more general ID rule. The following
code snippet provides an example:
start
: (WS | FOO)* EOF
;
WS : (options {greedy=false;} : ' '+) ;
FOO : ~('x' | 'y' | 'z')+ ;
If you give an input containing only spaces then WS will be chosen. Should one
change the rules order that FOO comes before WS then FOO will be chosen. Any
input containing other characters than spaces will match FOO, even if two or
more WS and FOO tokens could be produced. The lexer rules will match greedily
the maximum of applicable characters.
==========
Now the ambiguity I see concerns the "order of the token declaration" part. I
don't know whether it is referring to the declared order in the line containing
"(WS|FOO)" or the order of the 2 lines "WS : ..." vs "FOO : ...". This is
because the order is [WS,FOO] in both places.
If the tutorial could be updated to, say, either of these, it would be much more
clear:
1:
start
: (FOO | WS)* EOF
;
WS : (options {greedy=false;} : ' '+) ;
FOO : ~('x' | 'y' | 'z')+ ;
2:
start
: (WS | FOO)* EOF
;
FOO : ~('x' | 'y' | 'z')+ ;
WS : (options {greedy=false;} : ' '+) ;
So, bottom line, between #1 and #2 here, which example would have WS taking
higher precedence, and which example would have FOO taking higher precedence?
Thank you in advance for your help.
-- Darren Duncan
More information about the antlr-interest
mailing list