[antlr-interest] Common token vocabulary

Wed Aug 22 14:53:04 PDT 2012

I have an ANTLR based application that parses either of two variations of a target language. The two variations have a mostly common vocabulary. There are around 200 tokens in common and less than 10 specific to one variations or the other. I want to token types for the common vocabulary tokens to be the same for both language variations so that I can reference token types in custom Java code that is used by both variations.

I had a solution working with ANTLR 3.1 that put the common tokens in a base vocabulary lexer grammar. The base vocabulary grammar was imported into combined lexer/parse grammars, one for each language variation. ANTLR 3.1 generated token types in the order the token appeared, so the common vocabulary tokens had the same types in both language variations. The variation specific tokens were then assigned larger numbers.

I'm currently attempting to upgrade this to ANTLR 3.4, and have run into a problem. ANTLR 3.4 seems to alpha sort the tokens by name before assigning token types. This mixes variation specific tokens with common vocabulary tokens, and the type numbers no longer match for the common tokens.

Is there a better way to accomplish this in ANTLR 3.4? Thanks.

-rich