[antlr-interest] grammar notation (every char except...)

Johannes Luber jaluber at gmx.de
Fri Apr 20 15:08:16 PDT 2007


bace.spam at gmx.net wrote:
> Hi all,
> 
> I am totally new to antlr, but I have some practice with other parser gernerators. I want to recognize something like 

I can help only with parser grammars for v3, which will be probably
released next month as a final, so I suggest to learn v3 instead. You
can download the betas, though, and use ANTLRworks. A few points of
interest are shown here:
<http://www.antlr.org/wiki/display/ANTLR3/Quick+Starter+on+Parser+Grammars+-+No+Past+Experience+Required>
If you still prefer 2.7.7, you may get a few pointers nonetheless.

A general difference between ANTLR 3 and 2.7.7, that v3 uses '' instead
"" as string delimiters.
> 
> "// comment/goes^&on //" and
> "## comment/goes^&on ##"
> 
> So I want to allow everything inside, except the "//" and except the "##". It is a principle to let the tokens as much as atomic as possible, isn't it. I think 

Do you want to allow '##' in '//' comments and the other way around? It
looks that way.

> TOKEN_COMMENT : "//" .* "//";
> 
> is not recommended. Better should be
> 
> TOKEN_SLASH : '/';
> 
> I could also imagine to define
> 
> TOKEN_TAG : "//";
> 
> instead of TOKEN_SLASH.
> 
> 
> How can I specify the content (all chars allowed, except "//") in the grammar with antlr (I use 2.7.7)?
> 
> comment
>   :  TOKEN_TAG ~("//" | "##")* TOKEN_TAG
>   ;

Adapting the ML_COMMENT rule from the tutorial:

TOKEN_COMMENT : '//' ( options {greedy=false;} : . )* '//' ;

This matches multiline comments, as . recognizes the '\n'.

> and a lot of other further notations like ( . | ~"//" | ~"##" )* are not accepted. Has anyone an idea to get this problem solved?

( . | ~"//" | ~"##" )* would recognize everything. (~( '//' | '##' ))*
may result in your desired behaviour, so I can't guarantee that ~ works
on strings, too.

Best regards,
Johannes Luber


More information about the antlr-interest mailing list