[antlr-interest] Whitespace issues

Christopher Laco claco at chrislaco.com
Thu May 7 23:15:20 PDT 2009


I'm having a problem turning this text into tokens. I know it has to be
something obvious, but I can't see it...




[%"Dear "%][%          name          %][%",


It has come to our attention that your account is in

arrears to the sum of "%][% debt %][%".


Please settle your account before "%][% deadline %][%" or we

will be forced to revoke your Licence to Thrill.


The Management."%]





In short I want any white space after [%  or before %] to be lumped into
that brackets token. With this grammar


TSTART : '[%' WS*;

TSTOP : WS* '%]';

CSTART : TSTART '"';

CSTOP : '"' TSTOP;


fragment

WS : ('\r'|'\n'|'\t'|' ');

CHAR : ('\u0000'..'\uffff');


document

: (content | block)*

;


content

: CSTART CHAR+ CSTOP

;


block

: TSTART CHAR* TSTOP

;



, white space is indeed grouped with it's bracket and the tree looks
like I expect, except any character after two newlines yields this type
of grouping in the above text:


[%"

,

It


[%"

.

Pl


l

l

.


Th

e



The console says:


> [01:41:01] problem matching token at 3:1 NoViableAltException('I'@[()*
loopback of 19:9: ( WS )*])

> [01:41:01] problem matching token at 4:1 NoViableAltException('a'@[()*
loopback of 19:9: ( WS )*])

> [01:41:01] problem matching token at 6:1 NoViableAltException('P'@[()*
loopback of 19:9: ( WS )*])

> [01:41:01] problem matching token at 7:1 NoViableAltException('w'@[()*
loopback of 19:9: ( WS )*])

> [01:41:01] problem matching token at 9:1 NoViableAltException('T'@[()*
loopback of 19:9: ( WS )*])

>


Why would the two newlines in a row not be covered under CHAR+ in
between the start/stop bracket tokens?


Thanks,

-=Chris



More information about the antlr-interest mailing list