[antlr-interest] simple grammar with wildcard

Wed Apr 23 10:12:17 PDT 2008

On Thu, Apr 24, 2008 at 2:53 AM, Johannes Luber <jaluber at gmx.de> wrote:
> jason zhang schrieb:
>
>
> > Hi, cai
> > I removed the NONQUOTE definition and use wildcard.
> > ------------------------------
> > grammar Test;
> >
> > program : attribute*;
> > attribute : n=ID ':' '"' e=(.*)'"' LINEBREAK{ System.out.println("match
> attrname==============="+$n.text+" attrvalue="+$e.text);};
> > ID    :    ('a'..'z'|'A'..'Z'|'0'..'9'|'-'|'_')+;
> > WS : ( '\t' | ' ' )+     { $channel=HIDDEN; } ;   LINEBREAK
> >    :    '\r'?'\n';
> > --------------------------------
> > When I test this grammar by running generated java code. The e is not set
> by the generated java code anywhere. I got a NullPointerException. How can I
> capture the value (.*) ?
> >
> > thanks
> >
> > -jason
> >
>
>  You have encountered a known bug. One can't use "e=(...)" yet. Either you
> can forgo the parentheses or you have to create a new subrule.
>
>  Johannes
>
Or you should be able to use "(e+=.)*". However this will generate a
list of tokens so I don't think e.text will work. I think it should
work with a subrule.
However that grammar won't do what you want. The wildcard in the
parser matches any token not any character so it won't match any
characters not matched by lexer rules. You could change the NONQUOTE
rule so it doesn't include ID or LINEBREAK. Or the first lexer rule
specified for a given character sequence will match, so if you go back
to the original grammar and change the ordering so that ATTRVALUE is
after ID and LINEBREAK then they should match first. Then in your
parser use (ID|ATTRVALUE|LINEBREAK).

Tom.