[antlr-interest] Reg Multi-line comments

Thu Jul 16 04:56:38 PDT 2009

After doing a little bit of research, i think this is my finding. The only
information, i can derive from AntLR infrastructure would be that it was
expecting a '*' but met with an EOF. There is no way i would be able to find
out that this '*' for the LEXER token ML_COMMENT.  My analysis is based on
the following

This is the grammar, i used

*grammar Expr1;

@lexer::members {
    //@Override
    public String getErrorMessage(RecognitionException me,String[]
tokenNames_)
    {
        String tokenName="<unknown>";
        if( me instanceof MismatchedTokenException )
        {
            System.out.println(state.type);
            System.out.println(ML_COMMENT);
            if(state.type == ML_COMMENT)
            {
                System.out.println("First Breakthrough");
            }
        }
        System.out.println(me);
        me.printStackTrace();
        return tokenName;
    }
}

ML_COMMENT
    :    '/*' ( options { greedy = false; } : .* ) '*/' { skip(); };

ml_comment
    :    ('\r\n')* { System.out.println("Success"); }
    ;*

when i entered
*/*
123
Ctrl-Z

i got
0
4
MismatchedTokenException(-1!=42)
MismatchedTokenException(-1!=42)
        at org.antlr.runtime.Lexer.match(Lexer.java:167)
        at Expr1Lexer.mML_COMMENT(Expr1Lexer.java:119)
        at Expr1Lexer.mTokens(Expr1Lexer.java:161)
        at org.antlr.runtime.Lexer.nextToken(Lexer.java:84)
        at
org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java
:95)
        at org.antlr.runtime.CommonTokenStream.LT
(CommonTokenStream.java:238)
        at org.antlr.runtime.CommonTokenStream.LA
(CommonTokenStream.java:300)
        at Expr1Parser.ml_comment(Expr1Parser.java:45)
        at Main.main(Main.java:13)
line 3:0 <unknown>
Success
*
But in mML_COMMENT function, the token being parsed is never saved as a
state. It is saved only after the token is parsed complete.

   * // $ANTLR start "ML_COMMENT"
    public final void mML_COMMENT() throws RecognitionException {
        try {
            int _type = ML_COMMENT;
            int _channel = DEFAULT_TOKEN_CHANNEL;
            // Expr1.g:24:2: ( '/*' ( options {greedy=false; } : ( . )* )
'*/' )
            // Expr1.g:24:4: '/*' ( options {greedy=false; } : ( . )* ) '*/'
            {
            match("/*");

            // Expr1.g:24:9: ( options {greedy=false; } : ( . )* )
            // Expr1.g:24:41: ( . )*
            {
            // Expr1.g:24:41: ( . )*
            loop1:
            do {
                int alt1=2;
                int LA1_0 = input.LA(1);

                if ( (LA1_0=='*') ) {
                    int LA1_1 = input.LA(2);

                    if ( (LA1_1=='/') ) {
                        alt1=2;
                    }
                    else if ( ((LA1_1>='\u0000' && LA1_1<='.')||(LA1_1>='0'
&& LA1_1<='\uFFFF')) ) {
                        alt1=1;
                    }

                }
                else if ( ((LA1_0>='\u0000' && LA1_0<=')')||(LA1_0>='+' &&
LA1_0<='\uFFFF')) ) {
                    alt1=1;
                }

                switch (alt1) {
                case 1 :
                    // Expr1.g:24:41: .
                    {
                    matchAny();

                    }
                    break;

                default :
                    break loop1;
                }
            } while (true);

            }

            match("*/");    //This is where the error occurs

             skip();

            }

            state.type = _type;
            state.channel = _channel;
        }
        finally {
        }
    }
*

Since state is saved after that point, i won't be able to get the state with
the help of overriding functions. So i think the only option is match the
token which finds out the error and throw an exception(should be a user
defined one, which can't be caught by AntLR infra) from inside.

*ML_COMMENT_ERR
    :    '/*' (  (~('*/'))* ) {  System.out.println("Throw the Required
Error here"); }
    ;*

It works. But i think it would be nice if we have the lexer with the extra
information on which token it is parsing right now. Like

state.type_being_parsed = _type; at the top of the function.

Please let me know, whether my approach is correct.

Thanks,
Gokul.

On Thu, Jul 16, 2009 at 3:44 PM, Gokulakannan Somasundaram <
gokul007 at gmail.com> wrote:

> Hi,
>    I am trying to filter out multi-line comments, for which i am using the
> following Token definition (provided in antlr.org)
> ML_COMMENT
>     :    '/*' ( options { greedy = false; } : .* ) '*/' { skip(); };
>
> But i intend to provide a informative error message, when EOF occurs
> without any '*/'.  Can someone help me on how to achieve this? I am trying
> out lot of things, but nothing seems to work and i seem to missing some
> basic fact/knowledge.
>
> Thanks,
> Gokul.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090716/d542e2d8/attachment.html