[antlr-interest] White spaces not allowed

Gavin Lambert antlr at mirality.co.nz
Mon Jan 12 12:19:45 PST 2009


At 08:53 13/01/2009, Dominic Tardif wrote:
 >Hello everyone!  I've been working on this grammar for quite 
some
 >time now, and it works quite well except for one little detail:
 >white spaces are not allowed.
[...]
 >stmt:  ID ' ' function_id STMT_END      -> ^(STMT ID 
function_id)
 >      |  ID '=' expr STMT_END           -> ^('=' ID expr)
 >      |  NEWLINE                        ->
 >      ;

Your grammar is expecting to see NEWLINE tokens...

[...]
 >NEWLINE:  ('\r'? '\n')+;
 >WS:       (' '|'\t'|'\r'|'\n')+ {skip();};

... but your NEWLINE and WS tokens overlap, such that if there is 
any WS before (or possibly even after) a newline then the newline 
will be consumed and skipped without generating a NEWLINE token.

Having said that, I'm not entirely sure why you are using NEWLINE 
tokens in your parser; in most cases it looks like it's optional 
anyway, so it seems like it could just be removed (though you 
might need to change some 'stmt+'s to 'stmt*'s as well).


That's not the real problem, though.  The real problem is that 
quoted space you have in the stmt rule above.  Whenever you use a 
quoted literal in a parser rule, it effectively creates a new 
lexer rule -- so you then have two lexer rules representing 
spaces; one that represents exactly one space and one that 
represents multiple spaces, tabs, and newlines.  The two are going 
to fight.  Just remove this space (it shouldn't be necessary 
anyway) and it should behave.



More information about the antlr-interest mailing list