[antlr-interest] Composite Grammars

Sun Dec 25 17:11:30 PST 2011

Sorry about all the bug reports, but I keep finding more. :P

Setting output=AST causes some stuff about TreeAdaptors to get
inserted into the parser, but some parts have incorrect indentation,
which looks like it should break Python but not Java (I originally
wrote here that it was wrong indentation, but that's because it's
using a single tab character instead of two levels of 4 spaces in the
generated code and apparently I have vim show me hard tabs as 4 spaces
long). In Java there simply is no indentation for those lines yet they
are inside a class definition.

And in setting adaptors on delegates, the delegates aren't referred to
correctly, being referenced as gC and gD instead of self.gC and
self.gD, which again breaks Python but not Java. Java is also missing
a newline between the setTreeAdaptor calls but that's not a functional
problem.

This latter kind of error I make myself when programming in Python
sometimes, and pychecker/pylint are pretty good at catching all them
at once rather than one at a time. So I ran pychecker on the files
outputted for my minimal grammar and it found only the errors I
reported here and in my previous post for the top level lexer and
parser...

But for G_C and G_D it stumbled upon a circular dependency by way of
G_C trying to import tokenNames from GParser, where GParser tries to
import G_C from G_C. It works from GParser, of course, since
tokenNames is defined before it attempts to load G_C. But python
semantics won't let you load it the other way. (Going through GParser
works in pychecker, and pychecker only complains about an unused local
variable set1_tree in G_C and has no complaints for G_D.)

In any case, because pychecker found nothing else, once I work around
the second issue here and the one from my last post, I can probably
actually start parsing. :)

On Sat, Dec 24, 2011 at 5:44 PM, Benjamin S Wolf <jokeserver at gmail.com> wrote:
> Actually, there are still issues here. Namely GLexer is trying to use
> both A and B directly as delegates, but never initializes the A
> delegate for G_B_A. G_B does, which leads me to believe that this can
> be solved in the constructor by adding "gA = gB.gA" in the Java case,
> "self.gA = self.gB.gA" for Python, "ctx->gA = ctx->gB->gA" for C, etc.
> But then again G_B is delegating to G_B_A; why then does GLexer want
> to delegate directly to G_B_A?
>
> (Attached GLexer.java and the full grammar in G.zip.)
>
> On Fri, Dec 23, 2011 at 9:49 PM, Benjamin S Wolf <jokeserver at gmail.com> wrote:
>> I've gotten some very strange errors while trying to make a composite
>> grammar, and I think I've figured out why and/or a way around it. I'm
>> posting this because the error messages were not that helpful on their
>> own, and I had to fool around for a while with a minimal test case
>> until I found a way out of the errors.
>>
>> I have a composite grammar G, which imports two disjoint lexer
>> grammars A and B, and a parser grammar C (which only requires the
>> tokens from A). Using antlr3.4 on G with varying subsequent changes
>> gives one of the following sets of errors, regardless of output option
>> or language.
>>
>> 1. G has no rules.
>>
>> 2. parser rule ... not allowed in lexer, lexer rule ... not allowed in
>> parser, etc.
>>
>> 3. java.lang.ClassCastException: org.antlr.runtime.tree.CommonTree
>> cannot be cast to org.antlr.tool.GrammarAST.
>>
>> The short answer (before I go into details below) is that a) G needs a
>> parser rule, not just lexer rules, and b) G should only import one
>> lexer grammar, and the others should be imported by that one.
>> Strangely, b) does not apply to parser grammars, as I added a second
>> parser grammar D (dependent on both A and B) to test, and G is fine*
>> either way.
>>
>> The long story: When I encountered (1), I added a dummy lexer rule
>> "COMMA : ',' ;". This cured G's lack of rules but now antlr3.4 was
>> giving me (2), where it seemed that antlr3 thought I was putting all
>> of A's lexer rules in C and all of C's parser rules in A (and B,
>> etc.). Since I had no rules dependent on B, I removed it from being
>> imported. With G importing only A and C, I was now getting (3). I
>> added the rule "comma : COMMA ;" to G and now antlr3 completed
>> successfully (and still did when I folded these two rules together
>> into "comma : ',' ;"). So I added B back to the import list from G,
>> and it gave me (2) again. But removing B from G's import list and
>> making A import it made it work fine.
>>
>> So antlr3 successfully produces a recognizer for G when G imports A,
>> C, and D, where A imports B, or when G imports B, C, and D, and B
>> imports A**.
>>
>> I am not sure of the root reason behind the inability of the top level
>> of a composite grammar to import two lexer grammars (whether a design
>> decision or bug, eg.) as none of the documentation I could find on
>> composite grammars indicates either that this is the case or should be
>> otherwise. I would have liked a better error message in place of (2),
>> at least for the case where G had a lexer rule but not a parser rule,
>> because it would have saved a little bit of stumbling around.
>>
>> *By "fine" I mean antlr3 finishes successfully. But if G doesn't
>> import B, then the generated lexer can't produce tokens defined in B
>> and so the rules in D can't be reached.
>>
>> **Unless you're like me, and have an unfortunately large lexer grammar
>> B, which causes antlr3 to run out of stack space if G imports A
>> imports B but not if G imports B imports A.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GParser.py
Type: application/octet-stream
Size: 3002 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20111225/ffa2bd76/attachment.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GParser.java
Type: application/octet-stream
Size: 3231 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20111225/ffa2bd76/attachment-0001.obj