[antlr-interest] Reg Multi-line comments
Gokulakannan Somasundaram
gokul007 at gmail.com
Thu Jul 16 04:56:38 PDT 2009
After doing a little bit of research, i think this is my finding. The only
information, i can derive from AntLR infrastructure would be that it was
expecting a '*' but met with an EOF. There is no way i would be able to find
out that this '*' for the LEXER token ML_COMMENT. My analysis is based on
the following
This is the grammar, i used
*grammar Expr1;
@lexer::members {
//@Override
public String getErrorMessage(RecognitionException me,String[]
tokenNames_)
{
String tokenName="<unknown>";
if( me instanceof MismatchedTokenException )
{
System.out.println(state.type);
System.out.println(ML_COMMENT);
if(state.type == ML_COMMENT)
{
System.out.println("First Breakthrough");
}
}
System.out.println(me);
me.printStackTrace();
return tokenName;
}
}
ML_COMMENT
: '/*' ( options { greedy = false; } : .* ) '*/' { skip(); };
ml_comment
: ('\r\n')* { System.out.println("Success"); }
;*
when i entered
*/*
123
Ctrl-Z
i got
0
4
MismatchedTokenException(-1!=42)
MismatchedTokenException(-1!=42)
at org.antlr.runtime.Lexer.match(Lexer.java:167)
at Expr1Lexer.mML_COMMENT(Expr1Lexer.java:119)
at Expr1Lexer.mTokens(Expr1Lexer.java:161)
at org.antlr.runtime.Lexer.nextToken(Lexer.java:84)
at
org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java
:95)
at org.antlr.runtime.CommonTokenStream.LT
(CommonTokenStream.java:238)
at org.antlr.runtime.CommonTokenStream.LA
(CommonTokenStream.java:300)
at Expr1Parser.ml_comment(Expr1Parser.java:45)
at Main.main(Main.java:13)
line 3:0 <unknown>
Success
*
But in mML_COMMENT function, the token being parsed is never saved as a
state. It is saved only after the token is parsed complete.
* // $ANTLR start "ML_COMMENT"
public final void mML_COMMENT() throws RecognitionException {
try {
int _type = ML_COMMENT;
int _channel = DEFAULT_TOKEN_CHANNEL;
// Expr1.g:24:2: ( '/*' ( options {greedy=false; } : ( . )* )
'*/' )
// Expr1.g:24:4: '/*' ( options {greedy=false; } : ( . )* ) '*/'
{
match("/*");
// Expr1.g:24:9: ( options {greedy=false; } : ( . )* )
// Expr1.g:24:41: ( . )*
{
// Expr1.g:24:41: ( . )*
loop1:
do {
int alt1=2;
int LA1_0 = input.LA(1);
if ( (LA1_0=='*') ) {
int LA1_1 = input.LA(2);
if ( (LA1_1=='/') ) {
alt1=2;
}
else if ( ((LA1_1>='\u0000' && LA1_1<='.')||(LA1_1>='0'
&& LA1_1<='\uFFFF')) ) {
alt1=1;
}
}
else if ( ((LA1_0>='\u0000' && LA1_0<=')')||(LA1_0>='+' &&
LA1_0<='\uFFFF')) ) {
alt1=1;
}
switch (alt1) {
case 1 :
// Expr1.g:24:41: .
{
matchAny();
}
break;
default :
break loop1;
}
} while (true);
}
match("*/"); //This is where the error occurs
skip();
}
state.type = _type;
state.channel = _channel;
}
finally {
}
}
*
Since state is saved after that point, i won't be able to get the state with
the help of overriding functions. So i think the only option is match the
token which finds out the error and throw an exception(should be a user
defined one, which can't be caught by AntLR infra) from inside.
*ML_COMMENT_ERR
: '/*' ( (~('*/'))* ) { System.out.println("Throw the Required
Error here"); }
;*
It works. But i think it would be nice if we have the lexer with the extra
information on which token it is parsing right now. Like
state.type_being_parsed = _type; at the top of the function.
Please let me know, whether my approach is correct.
Thanks,
Gokul.
On Thu, Jul 16, 2009 at 3:44 PM, Gokulakannan Somasundaram <
gokul007 at gmail.com> wrote:
> Hi,
> I am trying to filter out multi-line comments, for which i am using the
> following Token definition (provided in antlr.org)
> ML_COMMENT
> : '/*' ( options { greedy = false; } : .* ) '*/' { skip(); };
>
> But i intend to provide a informative error message, when EOF occurs
> without any '*/'. Can someone help me on how to achieve this? I am trying
> out lot of things, but nothing seems to work and i seem to missing some
> basic fact/knowledge.
>
> Thanks,
> Gokul.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090716/d542e2d8/attachment.html
More information about the antlr-interest
mailing list