[antlr-interest] Examining characters in lexer

Dennis Brothers brothers at bros.com
Fri Mar 13 07:27:15 PDT 2009


OK, I tried it, and I'm getting an error I don't know how to interpret:

[10:19:33] error(10):  internal error:  
org 
.antlr 
.analysis 
.NFAToDFAConverter.getPredicatesPerNonDeterministicAlt(Unknown  
Source): no AST/token for nonepsilon target w/o predicate

That is emitted three times when I try to generate code.

Here's the lexer section:

NEWLINE	:	'\r'? '\n' ;
WS  	:	(' '|'\t'|NEWLINE)+ {$channel=HIDDEN;} ;
STRING 	:	( '0'..'9'|'_'|'\'' | LETTER )+ ;
LETTER	:	{ Char.IsLetter( input.LA(1) ) }?=> . ;

     - Dennis Brothers

On Mar 12, 2009, at 5:01 PM, Jim Idle wrote:

> Dennis Brothers wrote:
>> Is there a special symbol or method that returns the character about
>> to be scanned?
> input.LA(1)
> input.LA(2)
>
> etc.
>> In order to handle a variety of (natural) languages,
>> I'd like to use Unicode categories to detect various character types
>> (particularly letters).
>>
>> I want to do something like
>>
>> fragment LETTER : { Char.IsLetter( $char ) } ?=> . ;
>>
>> where $char is the next character to be scanned and Char.IsLetter()  
>> is
>> a .NET method that examines a character's Unicode category and  
>> returns
>> true if it's one of the letter categories.
>>
>> While I'm at it, is it legal to use a gated predicate like the above
>> in a lexer?
>>
> Yes, but you might find you need to finesse things so you don't create
> issues such as rules that never match and so on.
>
> It is fine to code the ranges in ANTLR, but you can end up with some  
> big
> lexers.
>
> However, overall, you don't want the lexer to fail, so it is better to
> accept things taht are not ataully valid, but then check the  
> validity in
> a routine that can say "Character xx is not a valid identifier
> character", as otherwise you just get
>
> Illegal character: xxx
>
> and that does not have enough context for a user.
>
> Jim
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>



More information about the antlr-interest mailing list