[antlr-interest] Manual Lookaheads
Jeff Barnes
jbarnesweb at yahoo.com
Fri Nov 11 19:37:05 PST 2005
I have a lot of this kind of thing going on in my
lexer:
protected
TLE_SF
{
boolean hasX020B = false;
boolean hasX36 = false;
int length = 0;
}
:
byte1:'\u0000'..'\u007F'
byte2:. {length = (byte1<<8)+byte2-8;}
'\u00D3'
'\u00A0'
'\u0090'
'\0'
'\0'
'\0'
(
{LA(2) == '\2'}? {length -= LA(1);}
(X02[LA(1),0X0B])! {hasX020B = true;}
| {LA(2) == '\u0036'}? {length -= LA(1);} X36!
{hasX36 = true;}
)+
({length > 0}?
{LA(2) == '\1'}? {length -= LA(1);} X01!
| {LA(2) == '\2' && ((LA(3) == '\u0087') ||
(LA(3) == '\15') || (LA(3) == '\u000C'))}? {length -=
LA(1);} (X02[LA(1),LA(3)])!
| {LA(2) == '\u0080'}? {length -= LA(1);} X80!
)*
{inputState.guessing != 0 || (hasX020B && hasX36
&& length==0)}?
;
The subrules are ambiguous without k > 1, therefore
protected. Of course, this generates a plethora of
nondeterminism warnings. But the lexer works after
very extensive testing. It's just that doing this
*feels* hackish.
Is there something I'm missing? The language (MO:DCA)
is based on structured fields like what's described in
the above rule. Does anybody have experience parsing
AFP files?
Regards,
Jeff
More information about the antlr-interest
mailing list