[antlr-interest] Both hidden and required whitespace

Gavin Lambert antlr at mirality.co.nz
Thu May 8 13:58:38 PDT 2008


At 06:48 9/05/2008, Kaleb Pederson wrote:
 >expr: 'one' 'two' COLON (INDENT 'numbers')*;
 >COLON: ':';
 >fragment INDENT:	'\n' (' '|'\t')*;
 >WS: (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};
[...]
 >where Token 9 is 'numbers'.  I presume that WS is consuming the
 >INDENT and thus I'm not seeing it in the stream.

It's more serious than that -- your grammar cannot possibly 
produce INDENT tokens, since it's a fragment rule.  So for 
starters, remove the 'fragment' from the INDENT rule.

This will work in most cases, but not all; for example, because it 
requires a leading newline it won't work on the first line of the 
file, and it also won't work if there is trailing whitespace on 
the line before the newline (since it will already be in the WS 
rule at that point, and it will continue matching).  It also won't 
work if you have Windows end-of-lines, for a similar reason.

Where you should go from here depends on how complicated your 
grammar is already.  I had a similar need to express indentation 
in a grammar I worked on recently, but in that case it was simple 
enough (and had enough weird edge cases) that unhiding the WS rule 
and splitting it into separated WS and NL rules made the most 
sense.  Obviously this requires modifying all the parser rules to 
explicitly indicate where whitespace and newlines are permitted.

I believe someone has written a python grammar for ANTLR; that's 
an indent-sensitive language, so it might be useful looking at how 
it's handled there.



More information about the antlr-interest mailing list