[antlr-interest] Tokenizing question
bsder at allcaps.org
Wed Jul 25 03:25:27 PDT 2007
Gavin Lambert wrote:
> At 16:58 25/07/2007, Andrew Lentvorski wrote:
> >Yes, it's true that my TT rule matches whitespace, but why
> >does the whitespace even get down to it?
> >I expected the MWS and TT rules to match and the MWS takes
> >precedence because it matches first.
> >Why is this not occurring?
> Because matches are greedy. Once it's started matching a TT, it will
> keep going as long as it continues to match. So since a space is a
> valid TT character, any whitespace following a TT will be part of that TT.
Yeah, a little more experimenting and I figured out that it was going
with greedy (maximal munch) precedence rather than specified order.
Interesting question: Is there a way to change that?
However, this probably means that the book "The definitive ANTLR
reference" has an error in Chapter 3 pg 46. It specifies both a NEWLINE
and a WS in which ' \n' will match the WS rule and vacuum up the NEWLINE
(since ' \n' is longer than either ' ' or '\n').
I'm currently looking at the XML parsing example on the Wiki. I think
I'm going to have to use gates to do what I need to do.
More information about the antlr-interest