[antlr-interest] How to combine tokens around comments without using AST

Jim Idle jimi at temporal-wave.com
Wed Apr 25 09:12:33 PDT 2012


Blindly combining the tokens assumes that comments only occur between ID
tokens, in fact you would need to check the token before and after and
then delete any other COMMENT tokens. I think that the lexer based
solution is better.

However, in the lexer based solution I gave earlier, I used the supplied
definition of COMMENT, which will eat all characters to EOF if there is a
lexical error in the COMMENT - unless they are multi-line, then I would
rewrite that token to stop at the end of line and give an error, or better
still, use custom code to consume characters when { is seen.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Eric
> Sent: Wednesday, April 25, 2012 4:07 AM
> To: Jamal Haider
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] How to combine tokens around comments
> without using AST
>
> Hi Jim,
>
> Off the top of my head.
>
> Use the TokenRewriteStream class after the lexer.
>
> Don't hide the comment tokens. Use them as input to the
> TokenRewriteStream and then when you see a comment token, delete it and
> combine the text from the tokens before and after the comment into the
> token before the comment, and delete the token after the comment.
>
> Eric
> On Wed, Apr 25, 2012 at 6:34 AM, Jamal Haider
> <syedjamalhaider at gmail.com>wrote:
>
> > I am a newbie to ANTLR and using it to develop a parser for an
> > ambiguous language. What I want to do is to some how combine the
> > tokens around the "comments" into one token without using AST.
> >
> > I am using this simple grammar to illustrate the problem
> >
> > grammar test;
> >
> > query
> >    :     expression+
> >    ;
> >
> > expression
> >    :   alpha
> >    ;
> >
> >
> > alpha
> >    : ID
> >    ;
> >
> >
> > ID  :   ('a'..'z'|'A'..'Z'|'_')*
> >    ;
> >
> >
> > COMMENT
> >    :   '{' ( options {greedy=false;} : . )* '}' {$channel=HIDDEN;}
> >    ;
> >
> > Now if we execute it with a simple text "Test{Comments}er" two
> > separate tokens are generated i.e. "Test" and "er". while I want to
> > create a single token out of it. Any help will be much appreciated.
> >
> > Thanks in advance
> >
> > Jim
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
> >
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list