[antlr-interest] lexer precedence

David Weiser davidann at gmail.com
Thu Dec 9 11:41:54 PST 2010


Howdy,

I have a lexer which has the following rules (I'm modding the XML
Parser from http://www.antlr.org/wiki/display/ANTLR3/1.+Lexer ):

WS  : {tagMode}?=>   (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;}
   ;

PCDATA : { !tagMode }?=> (~'<')+
   ;

The problem I have is that the lexer ends up tokenizing sequences like
"\n\n\n\n" as PCDATA instead of WS.

It's apparent that there is a nondeterminism between WS and PCDATA
since '\n' matches both '\n' and '~<'.  How can I get around this?

--
Thanks,
dw



-- 
Thanks,
dw


More information about the antlr-interest mailing list