[antlr-interest] Reg Multi-line comments
Jim Idle
jimi at temporal-wave.com
Thu Jul 16 05:22:51 PDT 2009
This an FAQ really. I beleive I have nswered it mire than once on the
list, but:
MLB : '/*' ( options { greedy = false; } : .* )
( '*/' | { print error message }
{ skip(); }
;
You might need EOF rather just empty alt.
You could also use:
( {!(input.LA(1) == '*' && input.LA(2) == '/')}?=> .)*
And similar variations, which are probably better.
Jim
On Jul 16, 2009, at 4:56 AM, Gokulakannan Somasundaram <gokul007 at gmail.com
> wrote:
> After doing a little bit of research, i think this is my finding.
> The only information, i can derive from AntLR infrastructure would
> be that it was expecting a '*' but met with an EOF. There is no way
> i would be able to find out that this '*' for the LEXER token
> ML_COMMENT. My analysis is based on the following
>
> This is the grammar, i used
>
> grammar Expr1;
>
> @lexer::members {
> //@Override
> public String getErrorMessage(RecognitionException me,String[]
> tokenNames_)
> {
> String tokenName="<unknown>";
> if( me instanceof MismatchedTokenException )
> {
> System.out.println(state.type);
> System.out.println(ML_COMMENT);
> if(state.type == ML_COMMENT)
> {
> System.out.println("First Breakthrough");
> }
> }
> System.out.println(me);
> me.printStackTrace();
> return tokenName;
> }
> }
>
> ML_COMMENT
> : '/*' ( options { greedy = false; } : .* ) '*/' { skip(); };
>
>
>
> ml_comment
> : ('\r\n')* { System.out.println("Success"); }
> ;
>
> when i entered
> /*
> 123
> Ctrl-Z
>
> i got
> 0
> 4
> MismatchedTokenException(-1!=42)
> MismatchedTokenException(-1!=42)
> at org.antlr.runtime.Lexer.match(Lexer.java:167)
> at Expr1Lexer.mML_COMMENT(Expr1Lexer.java:119)
> at Expr1Lexer.mTokens(Expr1Lexer.java:161)
> at org.antlr.runtime.Lexer.nextToken(Lexer.java:84)
> at org.antlr.runtime.CommonTokenStream.fillBuffer
> (CommonTokenStream.java
> :95)
> at org.antlr.runtime.CommonTokenStream.LT
> (CommonTokenStream.java:238)
> at org.antlr.runtime.CommonTokenStream.LA
> (CommonTokenStream.java:300)
> at Expr1Parser.ml_comment(Expr1Parser.java:45)
> at Main.main(Main.java:13)
> line 3:0 <unknown>
> Success
>
> But in mML_COMMENT function, the token being parsed is never saved
> as a state. It is saved only after the token is parsed complete.
>
> // $ANTLR start "ML_COMMENT"
> public final void mML_COMMENT() throws RecognitionException {
> try {
> int _type = ML_COMMENT;
> int _channel = DEFAULT_TOKEN_CHANNEL;
> // Expr1.g:24:2: ( '/*' ( options {greedy=false; } :
> ( . )* ) '*/' )
> // Expr1.g:24:4: '/*' ( options {greedy=false; } : ( . )
> * ) '*/'
> {
> match("/*");
>
> // Expr1.g:24:9: ( options {greedy=false; } : ( . )* )
> // Expr1.g:24:41: ( . )*
> {
> // Expr1.g:24:41: ( . )*
> loop1:
> do {
> int alt1=2;
> int LA1_0 = input.LA(1);
>
> if ( (LA1_0=='*') ) {
> int LA1_1 = input.LA(2);
>
> if ( (LA1_1=='/') ) {
> alt1=2;
> }
> else if ( ((LA1_1>='\u0000' && LA1_1<='.')||
> (LA1_1>='0' && LA1_1<='\uFFFF')) ) {
> alt1=1;
> }
>
>
> }
> else if ( ((LA1_0>='\u0000' && LA1_0<=')')||
> (LA1_0>='+' && LA1_0<='\uFFFF')) ) {
> alt1=1;
> }
>
>
> switch (alt1) {
> case 1 :
> // Expr1.g:24:41: .
> {
> matchAny();
>
> }
> break;
>
> default :
> break loop1;
> }
> } while (true);
>
>
> }
>
> match("*/"); //This is where the error occurs
>
> skip();
>
> }
>
> state.type = _type;
> state.channel = _channel;
> }
> finally {
> }
> }
>
>
> Since state is saved after that point, i won't be able to get the
> state with the help of overriding functions. So i think the only
> option is match the token which finds out the error and throw an
> exception(should be a user defined one, which can't be caught by
> AntLR infra) from inside.
>
> ML_COMMENT_ERR
> : '/*' ( (~('*/'))* ) { System.out.println("Throw the
> Required Error here"); }
> ;
>
> It works. But i think it would be nice if we have the lexer with the
> extra information on which token it is parsing right now. Like
>
> state.type_being_parsed = _type; at the top of the function.
>
>
> Please let me know, whether my approach is correct.
>
> Thanks,
> Gokul.
>
>
>
> On Thu, Jul 16, 2009 at 3:44 PM, Gokulakannan Somasundaram <gokul007 at gmail.com
> > wrote:
> Hi,
> I am trying to filter out multi-line comments, for which i am
> using the following Token definition (provided in antlr.org)
> ML_COMMENT
> : '/*' ( options { greedy = false; } : .* ) '*/' { skip(); };
>
> But i intend to provide a informative error message, when EOF occurs
> without any '*/'. Can someone help me on how to achieve this? I am
> trying out lot of things, but nothing seems to work and i seem to
> missing some basic fact/knowledge.
> ' ( options {greedy=false;} : . )* '|>' ;
> Thanks,
> Gokul.
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090716/b4b5ae9c/attachment.html
More information about the antlr-interest
mailing list