[antlr-interest] Reuse of tokens and rules

Fri Jun 30 07:22:26 PDT 2006

Hello,

I'm working on a project that needs to be able to parse embedded code in a 
selected language. This is similar to actions in ANTLR. The problem with this 
is that to do it right the lexer has to take things such as comments, strings 
and character literals into account. For example when matching code between 
two curly brackets, brackets in a comment or string should be ignored. This 
means the lexer should know when a string or comment starts and ends. However 
not all languages use the same syntax for strings and comments. For example 
Pascal uses '(* ... *)', '{ ... }' and '// ...' for comments, whereas Java 
uses '/* ... */' and '// ...'. With strings it's '...' vs. "...".

In this language I also need to be able to parse constructs such as type names 
and identifiers. These constructs are also specified in the selected 
language. A type name in Java will be different than a type name in Pascal. 
This means I also need different parser rules for various languages.

Is it possible to write small specifications for these constructs and import 
the one needed? These rules are only a small portion of all the rules. I 
don't like the solution of having to maintain the same parser and lexer for 
multiple target languages.

PS. I'm using ANTLR v3.

Best regards,
Emond Papegaaij