[antlr-interest] Both hidden and required whitespace
Gavin Lambert
antlr at mirality.co.nz
Thu May 8 13:58:38 PDT 2008
At 06:48 9/05/2008, Kaleb Pederson wrote:
>expr: 'one' 'two' COLON (INDENT 'numbers')*;
>COLON: ':';
>fragment INDENT: '\n' (' '|'\t')*;
>WS: (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;};
[...]
>where Token 9 is 'numbers'. I presume that WS is consuming the
>INDENT and thus I'm not seeing it in the stream.
It's more serious than that -- your grammar cannot possibly
produce INDENT tokens, since it's a fragment rule. So for
starters, remove the 'fragment' from the INDENT rule.
This will work in most cases, but not all; for example, because it
requires a leading newline it won't work on the first line of the
file, and it also won't work if there is trailing whitespace on
the line before the newline (since it will already be in the WS
rule at that point, and it will continue matching). It also won't
work if you have Windows end-of-lines, for a similar reason.
Where you should go from here depends on how complicated your
grammar is already. I had a similar need to express indentation
in a grammar I worked on recently, but in that case it was simple
enough (and had enough weird edge cases) that unhiding the WS rule
and splitting it into separated WS and NL rules made the most
sense. Obviously this requires modifying all the parser rules to
explicitly indicate where whitespace and newlines are permitted.
I believe someone has written a python grammar for ANTLR; that's
an indent-sensitive language, so it might be useful looking at how
it's handled there.
More information about the antlr-interest
mailing list