[antlr-interest] Examining characters in lexer

Dennis Brothers brothers at bros.com
Fri Mar 13 08:36:26 PDT 2009


Aaargh.  (Sound of hand hitting forehead)

It's always the dumb, simple things that are the hardest to see.

Efficiency isn't a major concern - I'm parsing one-line Lucene-like  
query expressions.  (But I'd still like to see your suggestion).

Thanks -
     - Dennis

On Mar 13, 2009, at 11:13 AM, Jim Idle wrote:

> Dennis Brothers wrote:
>> OK, I tried it, and I'm getting an error I don't know how to  
>> interpret:
>>
>> [10:19:33] error(10):  internal error:
>> org
>> .antlr
>> .analysis
>> .NFAToDFAConverter.getPredicatesPerNonDeterministicAlt(Unknown
>> Source): no AST/token for nonepsilon target w/o predicate
>>
>> That is emitted three times when I try to generate code.
>>
>> Here's the lexer section:
>>
>> NEWLINE	:	'\r'? '\n' ;
>> WS  	:	(' '|'\t'|NEWLINE)+ {$channel=HIDDEN;} ;
>> STRING 	:	( '0'..'9'|'_'|'\'' | LETTER )+ ;
>> LETTER	:	{ Char.IsLetter( input.LA(1) ) }?=> . ;
>>
> lexer grammar f;
>
> NEWLINE :       '\r'? '\n' ;
> WS      :       (' '|'\t'|NEWLINE)+ {$channel=HIDDEN;} ;
> STRING  :       ( '0'..'9'|'_'|'\'' | LETTER )+ ;
> fragment
> LETTER  : { Char.IsLetter( input.LA(1)) }?=> . ;
>
> You missed the fragment specifier from your LETTER rule, which  
> creates a
> real token rule that clashes with the invocation of the self same rule
> in STRING and all sorts of other problems ;-)
>
> If you are bothered about efficiency here, you might find that the
> following generates better code:
>
> Jim
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>



More information about the antlr-interest mailing list