[antlr-interest] Tokenizing question

Randall R Schulz rschulz at sonic.net
Wed Jul 25 07:21:49 PDT 2007


On Wednesday 25 July 2007 03:25, Andrew Lentvorski wrote:
> Gavin Lambert wrote:
> > At 16:58 25/07/2007, Andrew Lentvorski wrote:
> >  > ...
> >
> > Because matches are greedy.  Once it's started matching a TT, it
> > will keep going as long as it continues to match.  So since a space
> > is a valid TT character, any whitespace following a TT will be part
> > of that TT.
>
> Yeah, a little more experimenting and I figured out that it was going
> with greedy (maximal munch) precedence rather than specified order.
>
> Interesting question: Is there a way to change that?

Precedence, as you know, is based on rule order, earlier rule wins 
(after other issues such as longest-match-wins are a "tie"). There is 
an per-rule option you can use to change greedy to reluctant. Here's 
the example used in TDAR (page 85 of the printed book, page 100 in the 
PDF):

ML_COMMENT
    :   '/*' ( options { greedy = false; } : . ) * '*/'
    ;

That should be clear enough on how to use it and what it does.


> ...
>
> I'm currently looking at the XML parsing example on the Wiki.  I
> think I'm going to have to use gates to do what I need to do.

Dare I again bring up the question of whether it's sensible to use ANTLR 
or any similar tool to parse XML?


> -a


Randall Schulz


More information about the antlr-interest mailing list