[antlr-interest] Fun with ANTLR3: mystery of the huge lexer
David Piepgrass
qwertie256 at gmail.com
Sat Jun 30 16:12:17 PDT 2007
> Your ML_COMMENT needs to be a fragment rule and you need a predicate to
> stop '.' interfering with ML_COMMENT. I just produce this rule for my
> T-SQL lexer in fact (C here but the predicate is just input.LA(n) for
> Java):
Thanks, but is it really necessary to use a fragment? At the end of my
message I noted that this rule seems to work okay:
ML_COMMENT:
('/*')=> '/*'
(options{greedy=false;} : ML_COMMENT | .)*
'*/'
{ $channel = HIDDEN; };
ANTLR's architecture has changed and rules do not actually create
tokens (did they in v2?). All token functions return void.
> fragment ML_COMFRAG
> :
> '/*' ( options { greedy=false;}
> : {(LA(1)== '/' && LA(2) == '*')}? ML_COMFRAG
> | .
> )* '*/'
> ;
>
> That should help with that part. Then is your PUNC rule something that
> returns a token, or are you using that somewhere else too?
PUNC returns a token and is not used anywhere else. Its job is to
gather any sequence of adjacent punctuation into one token, which is a
problem because a string like /*!*/ matches all three rules:
ML_COMMENT, PUNC and RE_STRING.
It's too bad I can't assign "priorities" to each rule. I would like to
match /* as a comment whenever possible, with /regular-expressions/
having the next-highest priority and PUNC having the lowest.
The reason I treat punctuation this was, by the way, is that the set
of available operators can be user-defined and it can vary by scope.
Therefore it is not possible to identify operators within the lexer.
--
- David
http://qism.blogspot.com
More information about the antlr-interest
mailing list