[antlr-interest] Bug report: Composite grammar with all tokens defined in Lexer

George S. Cowan cowang at comcast.net
Tue Dec 23 14:39:41 PST 2008


Gavin Lambert said:
>It's illegal to use quoted strings in a parser-only grammar.

On p. 81 of the ANTLR Reference, in Figure 4.1, it says that a literal can
be used in any grammar. Is that the same thing that you mean by a "quoted
string"? I wonder, is there a difference here between ANTLR 2 and ANTLR 3?

Also, the error still happens when I substitute the token/rule-name for each
literal in the parser grammar. So I don't think this is the problem.

>It'll most definitely produce an unparsable grammar if 
>you do so when your lexer defines proper rules for these strings, 
>as the token type produced by the lexer will be different than the 
>token type the parser is using to match.

I've always felt uncomfortable about this, and I haven't seen a clear
statement on it in any documentation. But I still don't think we have the
problem nailed down here because the problem is with the generation of the
lexer code, not it's execution. Also, when I remove ONE lexer rule, the
lexer is generated and works on my tests. So, in that case, the lexer and
parser must be agreeing on the token types.

>Also, if this were a standalone grammar then you'd need to use the 
>tokenVocab option in the parser.  I'm not sure if this is 
>necessary or not when you're importing it into another grammar, 
>though.

Yes, but in http://www.antlr.org/wiki/display/ANTLR3/Composite+Grammars,
right before the last example, Terence says 'Parser grammars don't need to
explicitly import the lexer grammar they rely on, this is done only once in
the root composite grammar which "glues" its dependant grammars.'

I feel a little like I'm playing "Gotcha" here, Gavin, and I do want you to
know that I appreciate your taking the time to help me think through what's
going on.

George



-----Original Message-----
At 06:52 23/12/2008, George S. Cowan wrote:
>Using ANTLR 3.1.1 on Windows XP, I was unable to split Yang 
>Jiang's java.g 
>(http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g) 
>into separate parser and lexer grammars for a composite grammar. 
>The following grammar distilled from that one works, but not if 
>the PUBLIC rule is uncommented. The file Debug_DebugLex.java is 
>not generated; however, there is no warning or error message.
[...]
>------------ begin DebugParse.g -----------
>parser grammar DebugParse;
>
>modifiers
>     :
>     (   'public'
>     )*
>     ;
>
>interfaceHeader
>     :   modifiers 'interface' IDENTIFIER
>     ;
>------------- end DebugParse.g ------------

It's illegal to use quoted strings in a parser-only grammar.  (And 
it ought to produce an error, but it doesn't at the 
moment.)  It'll most definitely produce an unparsable grammar if 
you do so when your lexer defines proper rules for these strings, 
as the token type produced by the lexer will be different than the 
token type the parser is using to match.

Also, if this were a standalone grammar then you'd need to use the 
tokenVocab option in the parser.  I'm not sure if this is 
necessary or not when you're importing it into another grammar, 
though.



More information about the antlr-interest mailing list