[antlr-interest] Reuse of tokens and rules
Emond Papegaaij
e.papegaaij at student.utwente.nl
Fri Jun 30 07:22:26 PDT 2006
Hello,
I'm working on a project that needs to be able to parse embedded code in a
selected language. This is similar to actions in ANTLR. The problem with this
is that to do it right the lexer has to take things such as comments, strings
and character literals into account. For example when matching code between
two curly brackets, brackets in a comment or string should be ignored. This
means the lexer should know when a string or comment starts and ends. However
not all languages use the same syntax for strings and comments. For example
Pascal uses '(* ... *)', '{ ... }' and '// ...' for comments, whereas Java
uses '/* ... */' and '// ...'. With strings it's '...' vs. "...".
In this language I also need to be able to parse constructs such as type names
and identifiers. These constructs are also specified in the selected
language. A type name in Java will be different than a type name in Pascal.
This means I also need different parser rules for various languages.
Is it possible to write small specifications for these constructs and import
the one needed? These rules are only a small portion of all the rules. I
don't like the solution of having to maintain the same parser and lexer for
multiple target languages.
PS. I'm using ANTLR v3.
Best regards,
Emond Papegaaij
More information about the antlr-interest
mailing list