[antlr-interest] Fwd: Examining characters in lexer

Dennis Brothers brothers at bros.com
Thu Mar 12 14:29:49 PDT 2009


Meant to send this to the list - didn't realize the list was set up  
for personal replies.

Begin forwarded message:

> From: Dennis Brothers <brothers at bros.com>
> Date: March 12, 2009 5:13:21 PM EDT
> To: Jim Idle <jimi at temporal-wave.com>
> Subject: Re: [antlr-interest] Examining characters in lexer
>
> I want to use it in a rule something like
>
> WORD : ( LETTER | '0'..'9' | '_' )+ ;
>
> so failing to match simply terminates the WORD rule.
>
>    - Dennis Brothers
>
> On Mar 12, 2009, at 5:01 PM, Jim Idle wrote:
>
>> Dennis Brothers wrote:
>>> Is there a special symbol or method that returns the character about
>>> to be scanned?
>> input.LA(1)
>> input.LA(2)
>>
>> etc.
>>> In order to handle a variety of (natural) languages,
>>> I'd like to use Unicode categories to detect various character types
>>> (particularly letters).
>>>
>>> I want to do something like
>>>
>>> fragment LETTER : { Char.IsLetter( $char ) } ?=> . ;
>>>
>>> where $char is the next character to be scanned and  
>>> Char.IsLetter() is
>>> a .NET method that examines a character's Unicode category and  
>>> returns
>>> true if it's one of the letter categories.
>>>
>>> While I'm at it, is it legal to use a gated predicate like the above
>>> in a lexer?
>>>
>> Yes, but you might find you need to finesse things so you don't  
>> create
>> issues such as rules that never match and so on.
>>
>> It is fine to code the ranges in ANTLR, but you can end up with  
>> some big
>> lexers.
>>
>> However, overall, you don't want the lexer to fail, so it is better  
>> to
>> accept things taht are not ataully valid, but then check the  
>> validity in
>> a routine that can say "Character xx is not a valid identifier
>> character", as otherwise you just get
>>
>> Illegal character: xxx
>>
>> and that does not have enough context for a user.
>>
>> Jim
>>
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090312/08072bed/attachment.html 


More information about the antlr-interest mailing list