[antlr-interest] Good practice for grammar with translated keywords

Thomas Brandon tbrandonau at gmail.com
Thu Mar 12 08:17:46 PDT 2009


If you know ahead of time what language is to be used and all keywords
are acceptable identifiers you could use an action in your identifier
rule to check a language specific hashtable mapping text to token type
and set the resulting token type based on that. e.g.

@lexer::members {
  private Hashtable<String,Integer> literalsTable_en = new Hashtable() {{
    put("if", IF);
    put("then", THEN);
  }};

  private Hashtable<String,Integer> literalsTable_fr = new Hashtable() {{
    put("si", IF);
    put("alors", THEN);
  }};

  private Hashtable<String,Integer> getLiteralsTable() {
    // Return appropriate hashtable for language
  }

  private int checkLiterals(String text) {
    Hashtable<String,Integer> literalsTable = getLiteralsTable();
    Integer type = literalsTable.get(text);
    if(type != null)
       return type;
    else
      return ID;
  }
}

ID: 'a'..'z' { $type=checkLiterals($text); };

fragment IF: ;
fragment THEN: ;

NB: You need the fragment rules to define the token types as putting
them in the tokens section causes warnings (at least it did last I
checked).

Tom.

On Fri, Mar 13, 2009 at 1:50 AM, Olivier THIERRY
<olivier.thierry at gmail.com> wrote:
> Hi,
>
> I need to write a grammar for which keywords will be translated in
> english, french, spanish, ...
> Then I use StringTemplate to transform this language to Groovy script.
>
> For example I would have the following statement in english :
>
> IF (i = 0) THEN
>
> And the following in french :
>
> SI (i = 0) ALORS
>
> To do this I thought about writing :
> - many lexer grammar for keywords (i.e. translated tokens), one lexer
> grammar for each language
> - one lexer grammar for not translated tokens
> - one parser grammar that would import the not translated tokens lexer
> grammar and one of the translated tokens lexer grammar.
>
> Actually only the first lexer grammar is language specific, the other
> ones are common.
> But I can't find the right way to do this since tokens have to be
> imported in parser grammar. So it means you will have a parser grammar
> for each language.
>
> I also thought about using or statements in keywords tokens
> definition. Something like that : IF : 'IF' | 'SI';
> But it means you could mix languages, something like : IF (i=0) ALORS
>
> If anyone had the same need, how did he achieve this ?
>
> I use antlr3, antlrworks and antlr3 maven plugin.
>
> Thanks in advance for any help !
>
> Olivier
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list