[antlr-interest] Fwd: c# 2.0 grammar help

Fri Dec 29 08:22:22 PST 2006

Hello,

On 2006.12.28., at 20:13, James Briant wrote:

>
> I'm trying to create a grammar for C# 2.0 by following the spec.  
> I'm stuck on the lexer! I'm not sure how best to handle the  
> different character types. This is what I have done:
>
>
> LETTER_CHARACTER
>     :    c=. { IsUnicodeLetterChar( c ) }?
>     |    u=UNICODE_ESCAPE_SEQUENCE { IsUnicodeLetterChar( u ) }?
>     ;
>
> COMBINING_CHARACTER
>     :    c=. {    IsUnicodeCombiningCharacter(c) }?
>    |    u=UNICODE_ESCAPE_SEQUENCE { IsUnicodeCombiningCharacter 
> ( u ) }?
>     ;
>

Mr. Parr already answered this, however I think it's  better to use  
unicode char ranges in the lexer
(or am I missing the point?):
The following is from Mr. Parr's Java example grammar:

/**I found this char range in JavaCC's grammar, but Letter and Digit  
overlap.
    Still works, but...
*/
fragment
Letter
     :  '\u0024' |
        '\u0041'..'\u005a' |
        '\u005f' |
        '\u0061'..'\u007a' |
        '\u00c0'..'\u00d6' |
        '\u00d8'..'\u00f6' |
        '\u00f8'..'\u00ff' |
        '\u0100'..'\u1fff' |
        '\u3040'..'\u318f' |
        '\u3300'..'\u337f' |
        '\u3400'..'\u3d2d' |
        '\u4e00'..'\u9fff' |
        '\uf900'..'\ufaff'
     ;

fragment
JavaIDDigit
     :  '\u0030'..'\u0039' |
        '\u0660'..'\u0669' |
        '\u06f0'..'\u06f9' |
        '\u0966'..'\u096f' |
        '\u09e6'..'\u09ef' |
        '\u0a66'..'\u0a6f' |
        '\u0ae6'..'\u0aef' |
        '\u0b66'..'\u0b6f' |
        '\u0be7'..'\u0bef' |
        '\u0c66'..'\u0c6f' |
        '\u0ce6'..'\u0cef' |
        '\u0d66'..'\u0d6f' |
        '\u0e50'..'\u0e59' |
        '\u0ed0'..'\u0ed9' |
        '\u1040'..'\u1049'
    ;

They throw a warning, but they work OK...

byz

Gyula László

email:gyula.laszlo AT profund.hu
http://profund.hu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061229/cc4dea63/attachment.html