[antlr-interest] Examining characters in lexer
Dennis Brothers
brothers at bros.com
Fri Mar 13 07:27:15 PDT 2009
OK, I tried it, and I'm getting an error I don't know how to interpret:
[10:19:33] error(10): internal error:
org
.antlr
.analysis
.NFAToDFAConverter.getPredicatesPerNonDeterministicAlt(Unknown
Source): no AST/token for nonepsilon target w/o predicate
That is emitted three times when I try to generate code.
Here's the lexer section:
NEWLINE : '\r'? '\n' ;
WS : (' '|'\t'|NEWLINE)+ {$channel=HIDDEN;} ;
STRING : ( '0'..'9'|'_'|'\'' | LETTER )+ ;
LETTER : { Char.IsLetter( input.LA(1) ) }?=> . ;
- Dennis Brothers
On Mar 12, 2009, at 5:01 PM, Jim Idle wrote:
> Dennis Brothers wrote:
>> Is there a special symbol or method that returns the character about
>> to be scanned?
> input.LA(1)
> input.LA(2)
>
> etc.
>> In order to handle a variety of (natural) languages,
>> I'd like to use Unicode categories to detect various character types
>> (particularly letters).
>>
>> I want to do something like
>>
>> fragment LETTER : { Char.IsLetter( $char ) } ?=> . ;
>>
>> where $char is the next character to be scanned and Char.IsLetter()
>> is
>> a .NET method that examines a character's Unicode category and
>> returns
>> true if it's one of the letter categories.
>>
>> While I'm at it, is it legal to use a gated predicate like the above
>> in a lexer?
>>
> Yes, but you might find you need to finesse things so you don't create
> issues such as rules that never match and so on.
>
> It is fine to code the ranges in ANTLR, but you can end up with some
> big
> lexers.
>
> However, overall, you don't want the lexer to fail, so it is better to
> accept things taht are not ataully valid, but then check the
> validity in
> a routine that can say "Character xx is not a valid identifier
> character", as otherwise you just get
>
> Illegal character: xxx
>
> and that does not have enough context for a user.
>
> Jim
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list