[antlr-interest] Q: Advice on localizing lexer

Sun Jun 14 01:42:27 PDT 2009

At 17:12 14/06/2009, C. Mundi wrote:
>This turns out to be very naive, and I see this getting ugly 
>fast.  Already I have to localize the DSL keywords so there's no 
>way around writing multiple lexers.  So far I have only two 
>languages: English and Japanese.  But if this catches on, other 
>users will want their own.  I'd like to minimize the number of 
>lexers I need to maintain or at least maximize code reuse between 
>them.
>
>I figure this question must come up for DSL's pretty 
>regularly.  Although we more or less accept using a subset 
>of  Latin characters -- and usually just ASCII -- for general 
>purpose programming, the use case for DSL's almost begs for 
>localized identifiers and keywords.  The users in this case or 
>ordinary business people, not programmers.

When you start to run into issues like this (configurable token 
definitions -- especially if they're configurable at runtime), I 
think it goes a little beyond the scope of ANTLR's lexer.

But it's fairly easy to roll your own lexer that takes care of all 
that and then feed the resulting tokens into an ANTLR parser... 
but bear in mind that you'll still need to localise error messages 
and the like once you get there (though that's usually fairly 
straightforward).