[antlr-interest] Tokenizing question

Andrew Lentvorski bsder at allcaps.org
Wed Jul 25 03:25:27 PDT 2007


Gavin Lambert wrote:
> At 16:58 25/07/2007, Andrew Lentvorski wrote:
>  >Yes, it's true that my TT rule matches whitespace, but why
>  >does the whitespace even get down to it?
>  >
>  >I expected the MWS and TT rules to match and the MWS takes
>  >precedence because it matches first.
>  >
>  >Why is this not occurring?
> 
> Because matches are greedy.  Once it's started matching a TT, it will 
> keep going as long as it continues to match.  So since a space is a 
> valid TT character, any whitespace following a TT will be part of that TT.

Yeah, a little more experimenting and I figured out that it was going 
with greedy (maximal munch) precedence rather than specified order.

Interesting question: Is there a way to change that?

However, this probably means that the book "The definitive ANTLR 
reference" has an error in Chapter 3 pg 46.  It specifies both a NEWLINE 
and a WS in which ' \n' will match the WS rule and vacuum up the NEWLINE 
(since ' \n' is longer than either ' ' or '\n').

I'm currently looking at the XML parsing example on the Wiki.  I think 
I'm going to have to use gates to do what I need to do.

-a


More information about the antlr-interest mailing list