[antlr-interest] Lexer issue with Python target and predicates

Thu May 17 15:31:21 PDT 2007

I have a grammar which is producing illegal Python code. Both semantic
and syntactic predicates seem to trigger the incorrect code.

I've reduced the grammar to a minimal sub-set that demonstrates the problem:

synpred.g:
--------8<--------8<--------8<--------
grammar synpred;

options {
	language=Python;
}

// matches input of the form aaa.a.a or aaa|aaa

SLASH	:	'\\';
DOLLAR	:	'$';
HASH	:	'#';
LCURLY	:	'}';

startRule : LiteralExpression+;

LiteralExpression
	: { literalText=True; }
	  (LiteralComponent)* (DOLLAR|HASH)?
	  { literalText=False; }
	;

fragment
LiteralComponent
    : {literalText}? => ( options { greedy=true; } : (
        (SLASH) => SLASH (DOLLAR | HASH)
      | (DOLLAR | HASH) => (DOLLAR | HASH) ~(LCURLY)
      | ~(DOLLAR | HASH)
    ))+
    ;
--------8<--------8<--------8<--------

Generate lexer/parser:
--------8<--------8<--------8<--------
$ java org.antlr.Tool synpred.g
ANTLR Parser Generator  Version 3.0b7 (April 12, 2007)  1989-2007
warning(11):  internal warning: ignoring unsupported option: seperator
warning(11):  internal warning: ignoring unsupported option: seperator
--------8<--------8<--------8<--------

(I don't know if those warnings are relevant; I always get them, even on
grammars which produce working parsers...)

The resulting lexer generated from this grammar contains Python
'statements' like this:

     elraise NotImplementedError("eotDFAEdge")

I'm not sure why, or how to fix this. Manually replacing the 'elraise'
with 'else: raise' makes the lexer syntactically correct Python code
but, with the full grammar, the lexer is over 28Mb of Python (!) and
can't be imported :-(

Any help or suggestions?

L.