[antlr-interest] ANTLR generating invalid Java

Jim Idle jimi at temporal-wave.com
Fri Aug 8 14:42:12 PDT 2008


On Fri, 2008-08-08 at 14:11 -0700, Oren Ben-Kiki wrote:

> > ... your yaml language seems to be using space indent for structural meaning
> > (not a good idea if you ask me),
> 
> Be that as it may... the idea was to have ANTLR *not* do anything
> special with white space - just leave it alone and let it be parsed
> like any other character. Possible?


OK - I am still not following you exactly here. Do you mean that you
want the spaces and tabs to come back to the parser as individual
tokens? In which case you just specify them as part of lexer rules or in
their own lexer rule and they will come back as their own tokens.
Whitespace has no special meaning to ANTLR unless you make it so.

So you can say:

SPACE : ' ';
TAB : '\t';

or even:
TAB : '\t';

SPACE : ' ' (('   ')=>'   ' { $type = TAB; })? ;

And the parser can say:

rule : indent DATA;

indent : (SPACE|TAB)* ;



> > so if you just throw it away you won't have
> > context. Having looked at the specs for Yaml, I am not convinced that ANTLR
> > is the correct tool for you to use to be honest.
> 
> Yes, YAML syntax is hairy - almost as bad as Perl's :-) Still, it
> would be nice to use ANTLR for it, since it should allow retargeting
> the parser to diferrent platforms...


Sure. It might not be the best way to parse it but there will be nothing
wrong with an ANTLR parser that does it, but the hairyness is what is
causing all your semantic predicates to be be necessary of course and
when they are combined in the generated code, they are causing all your
complicated if statements. The predicates are in the target language and
ANTLR does not really knwo what they are other than they are something
that can be used to give boolean true or false. 

We can see that:
:
{ A }?
{ !A} ?
;

Covers all cases and there is no need to ask if (A && !A) then if (A)
else if (!A), but ANTLR does not know this. Of course, it is also
generating too many repeats of some of these things, as you surmise.

I wonder if you are trying to do too much in the parser and what you
really need is for the parer to pick up anything that looks like valid
syntax, and have it produce a tree, which you then walk and match up
indent levels and so on?

Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080808/179f403f/attachment-0001.html 


More information about the antlr-interest mailing list