[antlr-interest] Unicode character classes

Kirby Bohling kirby.bohling at gmail.com
Thu Dec 15 13:27:36 PST 2011


I believe that is a thread by the guy working on the grammar, and I
e-mailed him (Johannes Luber).
http://www.antlr.org/pipermail/antlr-interest/2007-March/019601.html

I've used the type of rules described in that grammar with success.  I
just e-mailed Johannes and he either e-mailed me his grammar or
pointed me to a copy online.

Not sure if that helps.  In the middle of that, Johannes does describe
a number of character classes.  It would be pretty straight forward to
create a fragment with a predicate for the character classes, but that
might be far more inefficient than just specifying them holes and all.
 If you want to lex them differently, yes, all of the classes will
have to be enumerated specifically, unless you want to spec them by
exclusion (which seems more difficult).

Kirby

On Thu, Dec 15, 2011 at 2:32 PM, Bart Kiers <bkiers at gmail.com> wrote:
> Hi Christian,
>
> Sure.
> For example, the following rule:
>
>    LatinExtB_first4 : '\u0180'..'\u0184';
>
> will match any of the first 4 Latin Extended-B* characters.
>
> Regards,
>
> Bart.
>
>
> * http://en.wikipedia.org/wiki/List_of_Unicode_characters#Latin_Extended-B
>
>
> On Thu, Dec 15, 2011 at 9:23 PM, Christian <chwchw at gmx.de> wrote:
>
>> Hi community,
>>
>> I've read a cuple of threads but all questions whether ANTLR supports
>> Unicode character classes are not answered. Therefore, I now pose the
>> question:
>>
>> Does ANTLR support unicode character classes? And if not, how can I
>> easily put them into a lexer grammar anyway?
>>
>> Regards,
>> Christian
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list