[antlr-interest] Fwd: c# 2.0 grammar help
Gyula László
gyula.laszlo at profund.hu
Fri Dec 29 08:22:22 PST 2006
Hello,
On 2006.12.28., at 20:13, James Briant wrote:
>
> I'm trying to create a grammar for C# 2.0 by following the spec.
> I'm stuck on the lexer! I'm not sure how best to handle the
> different character types. This is what I have done:
>
>
> LETTER_CHARACTER
> : c=. { IsUnicodeLetterChar( c ) }?
> | u=UNICODE_ESCAPE_SEQUENCE { IsUnicodeLetterChar( u ) }?
> ;
>
> COMBINING_CHARACTER
> : c=. { IsUnicodeCombiningCharacter(c) }?
> | u=UNICODE_ESCAPE_SEQUENCE { IsUnicodeCombiningCharacter
> ( u ) }?
> ;
>
Mr. Parr already answered this, however I think it's better to use
unicode char ranges in the lexer
(or am I missing the point?):
The following is from Mr. Parr's Java example grammar:
/**I found this char range in JavaCC's grammar, but Letter and Digit
overlap.
Still works, but...
*/
fragment
Letter
: '\u0024' |
'\u0041'..'\u005a' |
'\u005f' |
'\u0061'..'\u007a' |
'\u00c0'..'\u00d6' |
'\u00d8'..'\u00f6' |
'\u00f8'..'\u00ff' |
'\u0100'..'\u1fff' |
'\u3040'..'\u318f' |
'\u3300'..'\u337f' |
'\u3400'..'\u3d2d' |
'\u4e00'..'\u9fff' |
'\uf900'..'\ufaff'
;
fragment
JavaIDDigit
: '\u0030'..'\u0039' |
'\u0660'..'\u0669' |
'\u06f0'..'\u06f9' |
'\u0966'..'\u096f' |
'\u09e6'..'\u09ef' |
'\u0a66'..'\u0a6f' |
'\u0ae6'..'\u0aef' |
'\u0b66'..'\u0b6f' |
'\u0be7'..'\u0bef' |
'\u0c66'..'\u0c6f' |
'\u0ce6'..'\u0cef' |
'\u0d66'..'\u0d6f' |
'\u0e50'..'\u0e59' |
'\u0ed0'..'\u0ed9' |
'\u1040'..'\u1049'
;
They throw a warning, but they work OK...
byz
Gyula László
email:gyula.laszlo AT profund.hu
http://profund.hu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061229/cc4dea63/attachment.html
More information about the antlr-interest
mailing list