[antlr-interest] Reuse of tokens and rules

Mon Jul 3 02:42:14 PDT 2006

Having all the language specific rules marked with fragment and
accessed through gated predicates should mean they will not cause any
ambiguity problems.
And I wasn't suggesting you forgo entirely sperate language specific
parsers, that example rule collected the whole action as a single
token in your main grammar. The text of the token is then parsed by a
seperate language specific parser. So parseAction would create the
appropriate parser for the given action language (specified in a
seperate grammar file) and parse the text of the token with it. It
would then need to integrate the results into the main parse tree
somehow.
So, you'd have PYTHON_ACTION, JAVA_ACTION etc and subrules handling
strings, comments adnt eh like to properly detect the exit token but
otherwise not looking at the language at all. These would need to be
duplicated in each grammar file but are fairly basic and unlikely to
change if the set of target languages remains fixed. Then, if you need
more than just the unanalysed text of the rule you parse that with
seperate language specific parsers. Which would be reused among
seperate grammars. So actually, rather than parsing in the ACTION
lexer rule, in your parser you'd have something like:
actionBlock: ACTION;
declarationActionBlock: a=ACTION { res = parseAction($a.text);
doSomething(res); };
I was mainly fleshing out Ter's point that while in some cases full
embedded lexers for picking out the embedded language blocks may be
warranted, in many cases they are a more complicated solution than is
needed. In the example they are warranted by the langauge complexity
in your case they may be warranted by issues of reuse.

Tom.
On 7/3/06, Emond Papegaaij <e.papegaaij at student.utwente.nl> wrote:
> On Saturday 01 July 2006 21:20, Thomas Brandon wrote:
> > No problem, though note, as Ter says, that just having your lexer rule
> > handle the whole action block and then parsing the resulting string to
> > another parser may be a more straightforward option.
> > So, to handle multiple languages you could have something like:
> > ACTION:
> >    (    {isPython}? a=PYTHON_ACTION
> >
> >    |    {isJava}? a=JAVA_ACTION
> >
> > ...
> >    ) { parseAction($a.text); }
> >
> > fragment PYTHON_ACTION ...;
>
> This will lead to many lexer rules matching almost everything, with a
> predicate to enable them. Right now I'm already hitting the limit of what
> ANTLR can handle with ambiguous lexers. I need several rules, both lexer and
> parser, for every language. Keeping them together in a single grammar file
> will make it a lot easier to maintain them. It also means I can reuse the
> same grammar file in multiple parsers (I actually need 2 parsers with
> embedded actions).
>
> Thanks for the help.
>
> Best regards,
> Emond
>