[antlr-interest] Fun with ANTLR3: mystery of the huge lexer

Jim Idle jimi at temporal-wave.com
Sat Jun 30 17:15:30 PDT 2007


Yes, you need to use the fragment, though in this particular case, you
might actually get away with it. If you leave rules that are not
creating tokens as non fragment rules then the lexer thinks it has to
create tokens for them and you get clashes.

The lexer will assign the priority from top to bottom in listed order so
you should be able to achieve what you need - I find you get simpler
lexers though if you code the rule yourself to eliminate clashes and use
$type = xxx;. 

User operators eh, how do they specify precedence?

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of David Piepgrass
> Sent: Saturday, June 30, 2007 4:12 PM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Fun with ANTLR3: mystery of the huge
> lexer
> 
> > Your ML_COMMENT needs to be a fragment rule and you need a predicate
> to
> > stop '.' interfering with ML_COMMENT. I just produce this rule for
my
> > T-SQL lexer in fact (C here but the predicate is just input.LA(n)
for
> > Java):
> 
> Thanks, but is it really necessary to use a fragment? At the end of my
> message I noted that this rule seems to work okay:
> 
> ML_COMMENT:
>     ('/*')=> '/*'
>     (options{greedy=false;} : ML_COMMENT | .)*
>     '*/'
>     { $channel = HIDDEN; };
> 
> ANTLR's architecture has changed and rules do not actually create
> tokens (did they in v2?). All token functions return void.
> 
> > fragment        ML_COMFRAG
> >             :
> >                     '/*' ( options { greedy=false;}
> >                                 : {(LA(1)== '/' && LA(2) == '*')}?
> ML_COMFRAG
> >                                 |  .
> >                                 )* '*/'
> >             ;
> >
> > That should help with that part. Then is your PUNC rule something
> that
> > returns a token, or are you using that somewhere else too?
> 
> PUNC returns a token and is not used anywhere else. Its job is to
> gather any sequence of adjacent punctuation into one token, which is a
> problem because  a string like /*!*/ matches all three rules:
> ML_COMMENT, PUNC and RE_STRING.
> 
> It's too bad I can't assign "priorities" to each rule. I would like to
> match /* as a comment whenever possible, with /regular-expressions/
> having the next-highest priority and PUNC having the lowest.
> 
> The reason I treat punctuation this was, by the way, is that the set
> of available operators can be user-defined and it can vary by scope.
> Therefore it is not possible to identify operators within the lexer.
> 
> --
> - David
> http://qism.blogspot.com


More information about the antlr-interest mailing list