[antlr-interest] Reg Multi-line comments

Thu Jul 16 05:22:51 PDT 2009

This an FAQ really. I beleive I have nswered it mire than once on the  
list, but:

MLB : '/*' ( options { greedy = false; } : .* )
( '*/' | { print error message }
{ skip(); }
;

You might need EOF rather just empty alt.

You could also use:

( {!(input.LA(1) == '*' && input.LA(2) == '/')}?=> .)*

And similar variations, which are probably better.

Jim
On Jul 16, 2009, at 4:56 AM, Gokulakannan Somasundaram <gokul007 at gmail.com 
 > wrote:

> After doing a little bit of research, i think this is my finding.  
> The only information, i can derive from AntLR infrastructure would  
> be that it was expecting a '*' but met with an EOF. There is no way  
> i would be able to find out that this '*' for the LEXER token  
> ML_COMMENT.  My analysis is based on the following
>
> This is the grammar, i used
>
> grammar Expr1;
>
> @lexer::members {
>     //@Override
>     public String getErrorMessage(RecognitionException me,String[]  
> tokenNames_)
>     {
>         String tokenName="<unknown>";
>         if( me instanceof MismatchedTokenException )
>         {
>             System.out.println(state.type);
>             System.out.println(ML_COMMENT);
>             if(state.type == ML_COMMENT)
>             {
>                 System.out.println("First Breakthrough");
>             }
>         }
>         System.out.println(me);
>         me.printStackTrace();
>         return tokenName;
>     }
> }
>
> ML_COMMENT
>     :    '/*' ( options { greedy = false; } : .* ) '*/' { skip(); };
>
>
>
> ml_comment
>     :    ('\r\n')* { System.out.println("Success"); }
>     ;
>
> when i entered
> /*
> 123
> Ctrl-Z
>
> i got
> 0
> 4
> MismatchedTokenException(-1!=42)
> MismatchedTokenException(-1!=42)
>         at org.antlr.runtime.Lexer.match(Lexer.java:167)
>         at Expr1Lexer.mML_COMMENT(Expr1Lexer.java:119)
>         at Expr1Lexer.mTokens(Expr1Lexer.java:161)
>         at org.antlr.runtime.Lexer.nextToken(Lexer.java:84)
>         at org.antlr.runtime.CommonTokenStream.fillBuffer 
> (CommonTokenStream.java
> :95)
>         at org.antlr.runtime.CommonTokenStream.LT 
> (CommonTokenStream.java:238)
>         at org.antlr.runtime.CommonTokenStream.LA 
> (CommonTokenStream.java:300)
>         at Expr1Parser.ml_comment(Expr1Parser.java:45)
>         at Main.main(Main.java:13)
> line 3:0 <unknown>
> Success
>
> But in mML_COMMENT function, the token being parsed is never saved  
> as a state. It is saved only after the token is parsed complete.
>
>     // $ANTLR start "ML_COMMENT"
>     public final void mML_COMMENT() throws RecognitionException {
>         try {
>             int _type = ML_COMMENT;
>             int _channel = DEFAULT_TOKEN_CHANNEL;
>             // Expr1.g:24:2: ( '/*' ( options {greedy=false; } :  
> ( . )* ) '*/' )
>             // Expr1.g:24:4: '/*' ( options {greedy=false; } : ( . ) 
> * ) '*/'
>             {
>             match("/*");
>
>             // Expr1.g:24:9: ( options {greedy=false; } : ( . )* )
>             // Expr1.g:24:41: ( . )*
>             {
>             // Expr1.g:24:41: ( . )*
>             loop1:
>             do {
>                 int alt1=2;
>                 int LA1_0 = input.LA(1);
>
>                 if ( (LA1_0=='*') ) {
>                     int LA1_1 = input.LA(2);
>
>                     if ( (LA1_1=='/') ) {
>                         alt1=2;
>                     }
>                     else if ( ((LA1_1>='\u0000' && LA1_1<='.')|| 
> (LA1_1>='0' && LA1_1<='\uFFFF')) ) {
>                         alt1=1;
>                     }
>
>
>                 }
>                 else if ( ((LA1_0>='\u0000' && LA1_0<=')')|| 
> (LA1_0>='+' && LA1_0<='\uFFFF')) ) {
>                     alt1=1;
>                 }
>
>
>                 switch (alt1) {
>                 case 1 :
>                     // Expr1.g:24:41: .
>                     {
>                     matchAny();
>
>                     }
>                     break;
>
>                 default :
>                     break loop1;
>                 }
>             } while (true);
>
>
>             }
>
>             match("*/");    //This is where the error occurs
>
>              skip();
>
>             }
>
>             state.type = _type;
>             state.channel = _channel;
>         }
>         finally {
>         }
>     }
>
>
> Since state is saved after that point, i won't be able to get the  
> state with the help of overriding functions. So i think the only  
> option is match the token which finds out the error and throw an  
> exception(should be a user defined one, which can't be caught by  
> AntLR infra) from inside.
>
> ML_COMMENT_ERR
>     :    '/*' (  (~('*/'))* ) {  System.out.println("Throw the  
> Required Error here"); }
>     ;
>
> It works. But i think it would be nice if we have the lexer with the  
> extra information on which token it is parsing right now. Like
>
> state.type_being_parsed = _type; at the top of the function.
>
>
> Please let me know, whether my approach is correct.
>
> Thanks,
> Gokul.
>
>
>
> On Thu, Jul 16, 2009 at 3:44 PM, Gokulakannan Somasundaram <gokul007 at gmail.com 
> > wrote:
> Hi,
>    I am trying to filter out multi-line comments, for which i am  
> using the following Token definition (provided in antlr.org)
> ML_COMMENT
>     :    '/*' ( options { greedy = false; } : .* ) '*/' { skip(); };
>
> But i intend to provide a informative error message, when EOF occurs  
> without any '*/'.  Can someone help me on how to achieve this? I am  
> trying out lot of things, but nothing seems to work and i seem to  
> missing some basic fact/knowledge.
> ' ( options {greedy=false;} : . )* '|>' ;
> Thanks,
> Gokul.
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090716/b4b5ae9c/attachment.html