[antlr-interest] Internal error in 3.1; also, context-sensitive keyword sets

Tue Aug 5 14:25:46 PDT 2008

(1) When moving my work in progress from 3.0 to 3.1 (ANTLRWorks
1.2b5), I encountered an internal error. The grammar passes the Ctrl+R
check, but I get this exception when generating code for the parser:

java.lang.NullPointerException
org.antlr.analysis.NFAToDFAConverter.getPredicatesPerNonDeterministicAlt(NFAToDFAConverter.java:1601)
org.antlr.analysis.NFAToDFAConverter.tryToResolveWithSemanticPredicates(NFAToDFAConverter.java:1357)
org.antlr.analysis.NFAToDFAConverter.resolveNonDeterminisms(NFAToDFAConverter.java:1221)
org.antlr.analysis.NFAToDFAConverter.addDFAStateToWorkList(NFAToDFAConverter.java:968)
org.antlr.analysis.NFAToDFAConverter.findNewDFAStatesAndAddDFATransitions(NFAToDFAConverter.java:307)
org.antlr.analysis.NFAToDFAConverter.convert(NFAToDFAConverter.java:111)
org.antlr.analysis.DFA.<init>(DFA.java:233)
org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1206)
org.antlr.tool.Grammar.createLookaheadDFAs(Grammar.java:970)
org.antlr.tool.Grammar.createLookaheadDFAs(Grammar.java:920)
org.antlr.codegen.Target.performGrammarAnalysis(Target.java:114)
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:293)
org.antlr.Tool.generateRecognizer(Tool.java:419)
org.antlr.Tool.process(Tool.java:286)
org.antlr.works.generate.CodeGenerate.generate(Unknown Source)
org.antlr.works.generate.CodeGenerate.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

>From the source, the cause of this NullPointerException looks like a
simple typo: && instead of ||. However, without that typo, I would
still get an internal error "no AST/token for nonepsilon target w/o
predicate", and I'm not sure what that would mean. (My grammar is at
http://hansprestige.com/inform/antlr/buggy-080508.zip if that helps.)

The language I'm trying to parse is fraught with ambiguity, so I
expect an uphill battle - I'm trying 3.1 because 3.0 kept freezing
whenever I introduced a problem.

(2) In this same grammar, there are around 100 keywords, grouped into
sets which the original hand-coded parser turns on and off depending
on context. Moreover, some keywords are part of more than one set, and
each set can be enabled independently. My implementation matches every
keyword as a token, then uses parser rules to group them into 16
unique sets and combine the groups into various identifier contexts.
For example, here's the rule for an identifier in a context where only
statement and condition keywords are reserved:

statementConditionID
	:	ID | OPCODE | directiveKeywords | traceKeywords |
		directiveSubKeywords | miscKeywords |
		directiveOrTraceKeywords | traceOrMiscKeywords |
		directiveOrSubKeywords | directiveOrMiscKeywords |
		directiveSubOrSegmentKeywords |
		directiveSubOrMiscKeywords | directiveOrSubOrSegmentKeywords;

In this case, I'm excluding statementKeywords,
directiveOrStatementKeywords, directiveOrStatementOrMiscKeywords,
conditionKeywords, and directiveSubOrSegmentOrConditionKeywords. This
feels pretty silly. Is there a more sensible way to do it?

Jesse