[antlr-interest] case sensitivity for ANTLR v3 lexers

Terence Parr parrt at cs.usfca.edu
Tue May 16 14:04:46 PDT 2006


On May 16, 2006, at 12:00 PM, Don Caton wrote:

> Ter:
>
> Just my 2 cents, FWIW...
>
> I don't think Antlr should concern itself with any of this.  Keep  
> things as
> simple as possible, and only do exact ordinal comparisons of strings.

Well, people can subclass and change match if they want except that  
sets are matched inline without calling a method (because they are  
complicated beasts).  Those cannot be changed w/o ANTLR code  
generation changes.

Hm...well, people could do the following: come up with a char stream  
that yields uppercase to the lexer but stores the real char.  Then  
all char refs just have to be uppercase in the lexer and we're cool,  
right?  That way I don't have to mess with it...somebody can perhaps  
override a standard char stream.

> This is probably a codegen issue more than a core Antlr issue, but  
> one of
> the biggest frustrations for me in Antlr 2.x is that the whole  
> thing assumes
> 8-byte characters and strings.  There are hard-coded references to  
> string,
> stream, char, LPSTR, cout, etc. throughout the generated code as  
> well as the
> runtime code.  These should be defines or typedefs, so generating a  
> Unicode
> parser (UTF-16) would be as simple as doing something like '#define
> ANTLR_STRING wstring', '#define ANTLR_CHAR wchar_t', and so on.

I think Ric is thinking about this.

> Another problem is the various hard-coded ANSI, English strings in  
> error
> messages, and hard coded references to cout.  Please abstract  
> anything like
> this so that it can be overridden, so error messages can be  
> localized, and
> other output mechanisms can be used other than an ANSI console.   
> It's a big
> world out there and modern applications today need to support  
> Unicode and
> easy localization.

All error strings are in a template group now :)  People will be able  
to send in their language.stg files and errors will come out in the  
locale's strings. :)  All ST stuff allows char encoding ...

Ter



More information about the antlr-interest mailing list