[antlr-interest] Tokenizing question

Gavin Lambert antlr at mirality.co.nz
Wed Jul 25 02:01:40 PDT 2007


At 16:58 25/07/2007, Andrew Lentvorski wrote:
 >Yes, it's true that my TT rule matches whitespace, but why
 >does the whitespace even get down to it?
 >
 >I expected the MWS and TT rules to match and the MWS takes
 >precedence because it matches first.
 >
 >Why is this not occurring?

Because matches are greedy.  Once it's started matching a TT, it 
will keep going as long as it continues to match.  So since a 
space is a valid TT character, any whitespace following a TT will 
be part of that TT.

I'm less certain about the leading whitespace you showed there, 
but that's probably the "longest match wins" rule at work.  Given 
a choice between emitting two tokens and one token that cover the 
exact same input text, ANTLR will pick the single token, 
regardless of rule ordering.

IOW, in the input fragment "( a1 =", your grammar gives ANTLR the 
option of emitting either '(' MWS=' ' TT='a1' MWS=' ' '=' (5 
tokens) or '(' TT=' a1 ' '=' (3 tokens).  The 3 token version 
wins.

Also, you should really make your WS rule a fragment, since it's 
covered by the MWS rule anyway.



More information about the antlr-interest mailing list