[antlr-interest] Non-LL(*) decision, different behaviour in ANTLRWorks vs. ANTLR (command line)

Thu May 28 11:03:30 PDT 2009

Martin Probst wrote:
> Hi all,
>
> I have trouble with a grammar I'm developing. Everything worked fine,
> now I'm integrating some additional language constructs, and ANTLR
> doesn't like them. You can find the grammar here (sorry, it's pretty
> huge):
>
> http://code.google.com/p/xqpretty/source/browse/trunk/src/main/antlr3/com/martinprobst/xqpretty/XQuery.g
>
> The problem is that with the grammar as it is, ANTLR 3.1.3 reports
> "[fatal] rule stepExpr has non-LL(*) decision", and then some more of
> the same. I've spent some time tracking this, and I found that
> ANTLRWorks does not give these errors (even when generating code!) and
> also interprets the grammar correctly. So, is there any difference in
> how ANTLRWorks and ANTLR run the generator? Can I emulate that on the
> command line?
>
> I can get rid of the non-LL(*) warnings by modifying two lines, line
> 546 (ftWordsValue):
>
> ftWordsValue
>     : literal; // | (LCURLY expr RCURLY);
>
> and line 540 (ftPrimaryWithOptions):
>
> ftPrimaryWithOptions
>     : ftPrimary; // ftMatchOptions?;
It looks like you have tried to type up the grammar from the normative 
spec, without reorganizing it into an LL form. You will end up with 
myriad errors/problems that way.

If you just ask ANTLR Works to "Check Grammar", then it won't try to 
generate the code and often says nothing. If you Do Run->debug, you will 
find lots of errors are given out, not least of which is the missing 
lexer tokens:

[10:43:39] warning(105): /home/jimi/tmp/antlr/xquery/XQuery.g:302:62: no 
lexer rule corresponding to token: ElementContentChar
[10:43:39] warning(105): /home/jimi/tmp/antlr/xquery/XQuery.g:291:54: no 
lexer rule corresponding to token: CLOSE_TAG
[10:43:39] warning(105): /home/jimi/tmp/antlr/xquery/XQuery.g:300:7: no 
lexer rule corresponding to token: AposAttrContentChar
[10:43:39] warning(105): /home/jimi/tmp/antlr/xquery/XQuery.g:298:7: no 
lexer rule corresponding to token: QuotAttrContentChar
[10:43:39] warning(105): /home/jimi/tmp/antlr/xquery/XQuery.g:304:59: no 
lexer rule corresponding to token: ESCAPE_CURLY_CLOSE
[10:43:39] warning(105): /home/jimi/tmp/antlr/xquery/XQuery.g:304:39: no 
lexer rule corresponding to token: ESCAPE_CURLY_OPEN
[10:43:39] warning(105): /home/jimi/tmp/antlr/xquery/XQuery.g:291:8: no 
lexer rule corresponding to token: EMPTY_CLOSE_TAG

You are using some of these as parsing tokens, but have just declared 
them as imaginaries. Make them empty fragments in the lexer if you are 
changing lexer tokens to these types, then the warnings will go away.

The following errors are legitimate

[10:43:40] error(211): /home/jimi/tmp/antlr/xquery/XQuery.g:230:5: 
[fatal] rule stepExpr has non-LL(*) decision due to recursive rule 
invocations reachable from alts 1,2.  Resolve by left-factoring or using 
syntactic predicates or using backtrack=true option.

filterExpr just starts with primaryExpr, which means it ends up back at 
stepExpr.

When the spec says something like stepExpr, you need to reinterpret this 
as expression, then write a proper LL expression tree, using it for all 
casee. It is your tree walking AST phase (that would come next) that 
should verify if the expression is valid for a stepExpr.

[10:43:40] error(211): /home/jimi/tmp/antlr/xquery/XQuery.g:541:17: 
[fatal] rule ftPrimaryWithOptions has non-LL(*) decision due to 
recursive rule invocations reachable from alts 1,2.  Resolve by 
left-factoring or using syntactic predicates or using backtrack=true option.

Basically, same as above, it all needs to be left factored.

[10:43:40] warning(200): /home/jimi/tmp/antlr/xquery/XQuery.g:541:17: 
Decision can match input such as "CASE INSENSITIVE RETURN SLASH" using 
multiple alternatives: 1, 2

You have: ftMatchOptions?

That means the parser could take that path immediately, or not take it 
and take it next time in some way. This is basically all symptomatic of 
the same problem.

... and so on.

Basically, I hate to tell you this, but you really need to view this as 
a learning exercise and start again with the expression tree. Add one 
step at a time, starting at the bottom (primary) and working your way up 
the tree. Then you can start to fill in the main syntax when you 
expression tree is correct.

Of course, if you are just needing something quick and dirty, and don't 
care about good syntax error messages, you could try turning on global 
backtracking, which will likely work it out at runtime. However, it may 
not of course as there are quite a few structural issues here.

Jim