[antlr-interest] Manual Lookaheads

Fri Nov 11 19:37:05 PST 2005

I have a lot of this kind of thing going on in my
lexer:

protected
TLE_SF
{
    boolean hasX020B = false;
    boolean hasX36 = false;
    int length = 0;
}
:
    byte1:'\u0000'..'\u007F'  
    byte2:. {length = (byte1<<8)+byte2-8;}
    '\u00D3' 
    '\u00A0' 
    '\u0090' 
    '\0' 
    '\0' 
    '\0' 
    (
        {LA(2) == '\2'}? {length -= LA(1);}
(X02[LA(1),0X0B])! {hasX020B = true;}
    |   {LA(2) == '\u0036'}? {length -= LA(1);} X36!
{hasX36 = true;}
    )+
    ({length > 0}?
        {LA(2) == '\1'}? {length -= LA(1);} X01!
    |   {LA(2) == '\2' && ((LA(3) == '\u0087') ||
(LA(3) == '\15') || (LA(3) == '\u000C'))}? {length -=
LA(1);} (X02[LA(1),LA(3)])!
    |   {LA(2) == '\u0080'}? {length -= LA(1);} X80!
    )*
    {inputState.guessing != 0 || (hasX020B && hasX36
&& length==0)}?
;

The subrules are ambiguous without k > 1, therefore
protected. Of course, this generates a plethora of
nondeterminism warnings. But the lexer works after
very extensive testing. It's just that doing this
*feels* hackish.

Is there something I'm missing? The language (MO:DCA)
is based on structured fields like what's described in
the above rule. Does anybody have experience parsing
AFP files?

Regards,
Jeff