[antlr-interest] Whitespace issues

Thu May 7 23:15:20 PDT 2009

I'm having a problem turning this text into tokens. I know it has to be
something obvious, but I can't see it...

[%"Dear "%][%          name          %][%",

It has come to our attention that your account is in

arrears to the sum of "%][% debt %][%".

Please settle your account before "%][% deadline %][%" or we

will be forced to revoke your Licence to Thrill.

The Management."%]

In short I want any white space after [%  or before %] to be lumped into
that brackets token. With this grammar

TSTART : '[%' WS*;

TSTOP : WS* '%]';

CSTART : TSTART '"';

CSTOP : '"' TSTOP;

fragment

WS : ('\r'|'\n'|'\t'|' ');

CHAR : ('\u0000'..'\uffff');

document

: (content | block)*

;

content

: CSTART CHAR+ CSTOP

;

block

: TSTART CHAR* TSTOP

;

, white space is indeed grouped with it's bracket and the tree looks
like I expect, except any character after two newlines yields this type
of grouping in the above text:

[%"

,

It

[%"

.

Pl

l

l

.

Th

e

The console says:

> [01:41:01] problem matching token at 3:1 NoViableAltException('I'@[()*
loopback of 19:9: ( WS )*])

> [01:41:01] problem matching token at 4:1 NoViableAltException('a'@[()*
loopback of 19:9: ( WS )*])

> [01:41:01] problem matching token at 6:1 NoViableAltException('P'@[()*
loopback of 19:9: ( WS )*])

> [01:41:01] problem matching token at 7:1 NoViableAltException('w'@[()*
loopback of 19:9: ( WS )*])

> [01:41:01] problem matching token at 9:1 NoViableAltException('T'@[()*
loopback of 19:9: ( WS )*])

>

Why would the two newlines in a row not be covered under CHAR+ in
between the start/stop bracket tokens?

Thanks,

-=Chris