[antlr-interest] Whitespace issues
Christopher Laco
claco at chrislaco.com
Thu May 7 23:15:20 PDT 2009
I'm having a problem turning this text into tokens. I know it has to be
something obvious, but I can't see it...
[%"Dear "%][% name %][%",
It has come to our attention that your account is in
arrears to the sum of "%][% debt %][%".
Please settle your account before "%][% deadline %][%" or we
will be forced to revoke your Licence to Thrill.
The Management."%]
In short I want any white space after [% or before %] to be lumped into
that brackets token. With this grammar
TSTART : '[%' WS*;
TSTOP : WS* '%]';
CSTART : TSTART '"';
CSTOP : '"' TSTOP;
fragment
WS : ('\r'|'\n'|'\t'|' ');
CHAR : ('\u0000'..'\uffff');
document
: (content | block)*
;
content
: CSTART CHAR+ CSTOP
;
block
: TSTART CHAR* TSTOP
;
, white space is indeed grouped with it's bracket and the tree looks
like I expect, except any character after two newlines yields this type
of grouping in the above text:
[%"
,
It
[%"
.
Pl
l
l
.
Th
e
The console says:
> [01:41:01] problem matching token at 3:1 NoViableAltException('I'@[()*
loopback of 19:9: ( WS )*])
> [01:41:01] problem matching token at 4:1 NoViableAltException('a'@[()*
loopback of 19:9: ( WS )*])
> [01:41:01] problem matching token at 6:1 NoViableAltException('P'@[()*
loopback of 19:9: ( WS )*])
> [01:41:01] problem matching token at 7:1 NoViableAltException('w'@[()*
loopback of 19:9: ( WS )*])
> [01:41:01] problem matching token at 9:1 NoViableAltException('T'@[()*
loopback of 19:9: ( WS )*])
>
Why would the two newlines in a row not be covered under CHAR+ in
between the start/stop bracket tokens?
Thanks,
-=Chris
More information about the antlr-interest
mailing list