[antlr-interest] Literals not handled in grammars with import

Mon Aug 25 07:22:18 PDT 2008

Okay, further investigation shows that as tokenVocabs are ignored in
slave grammars ANTLR processes the reference to the literal in the
slave grammar, and adds it as an unassigned string. Then when
processing the master grammar it imports the vocab and correctly
defines the token. But then in
AssignTokenTypesBehavior.assignStringTypes it defines the unassigned
string reference from the slave grammar a new type, overwriting the
correct definition from the vocab.
I fixed this by adding a check for an existing type, giving:
	protected void assignStringTypes(Grammar root) {
		// walk string literals assigning types to unassigned ones
		Set s = stringLiterals.keySet();
		for (Iterator it = s.iterator(); it.hasNext();) {
			String lit = (String) it.next();
			// ADDED
			// Check if literal was subsequently assigned a type
			Integer curType;
			if((curType = grammar.composite.stringLiteralToTypeMap.get(lit)) != null) {
				stringLiterals.put(lit, curType);
				continue;
			}
			// END ADDED
			Integer oldTypeI = (Integer)stringLiterals.get(lit);
			int oldType = oldTypeI.intValue();
			if ( oldType<Label.MIN_TOKEN_TYPE ) {
				Integer typeI = Utils.integer(root.getNewTokenType());
				stringLiterals.put(lit, typeI);
				// if string referenced in combined grammar parser rule,
				// automatically define in the generated lexer
				root.defineLexerRuleForStringLiteral(lit, typeI.intValue());
			}
		}
	}
A test case for the problem is:
	public void testDelegatesSeeSameTokenTypeForLiteral() throws Exception {
		ErrorQueue equeue = new ErrorQueue();
		ErrorManager.setErrorListener(equeue);
		String parser =
			"grammar P;\n" +
			"options { output=AST; }" +
			"tokens { A;B;C; }" +
			"x : 'x'  {System.out.println(\"P.x\");};\n";
		if(!antlr("P.g", "P.g", parser, false))
			throw new Exception("Error generating parser");
		if(!compile("PLexer.java"))
			throw new Exception("Error compiling lexer");
		if(!compile("PParser.java"))
			throw new Exception("Error compiling parser");
		
		String slaveTreeParser =
			"tree grammar S;\n" +
			"tokens { A; }" +
			"x : 'x' {System.out.println(\"S.x\");};\n";
		if(!antlr("S.g", "S.g", slaveTreeParser, false))
			throw new Exception("Error generating slave tree parser");
		if(!compile("S.java"))
			throw new Exception("Error compiling slave tree parser");
		
		String masterTreeParser =
			"tree grammar M;\n" +
			"options { tokenVocab=P; }\n" +
			"import S;" +
			"tokens { A;B; }" +
			"x2 : 'x'  {System.out.println(\"M.x2\");};\n";
		if(!antlr("M.g", "M.g", masterTreeParser, false))
			throw new Exception("Error generating master tree parser");
		if(!compile("M.java"))
			throw new Exception("Error compiling master tree parser");
		
		writeFile(tmpdir, "input", "x");
		
		String found = rawExecRecognizer("PParser", "M", "PLexer", "x", "x",
true, false, false, false);
		assertEquals(
			"P.x\n" +
			"S.x\n", found);

		found = rawExecRecognizer("PParser", "M", "PLexer", "x", "x2", true,
false, false, false);
		assertEquals(
			"P.x\n" +
			"M.x2\n", found);
	}
I noticed also that before fixing as well as the assert failing the
test case reported an exception at:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 9
	at org.antlr.runtime.tree.TreeParser.getMissingSymbol(TreeParser.java:82)
	at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:624)
	at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:117)
	at M_S.x(M_S.java:45)
	at M.x(M.java:62)
	at Test.main(Test.java:19)
But I think that may be because the error above introduced a
discontinuity in token numbers. There was no token with ID 8 so ID 9
was going off the end of the tokenNames array.

Tom.
On Mon, Aug 25, 2008 at 5:08 PM, Thomas Brandon <tbrandonau at gmail.com> wrote:
> In 31, imported literals are not being handled correctly in tree
> grammars utilising import.
>
> Given the included grammars, while TestTree.g works properly, in
> TestTree2.g the literals are not correctly imported and new token
> types are assigned. This occurs in both rules from the imported
> grammar (tok1 in TestTree2_TestTree.java) and new rules in the
> importing grammar (tok1_2 in TestTree2.java).
>
> Tom.
>
> <Test.g>
> grammar Test;
>
> tok1: 'tok1';
> </Test.g>
>
> <TestTree.g>
> tree grammar TestTree;
>
> options {
>        tokenVocab=Test;
>        ASTLabelType = CommonTree;
> }
>
> tok1:
>        'tok1'
>        ;
> </Test2.g>
>
> <TestTree2.g>
> tree grammar TestTree2;
>
> options {
>        tokenVocab=Test;
>        ASTLabelType = CommonTree;
> }
>
> import TestTree;
>
> tok1_2:
>        'tok1'
>        ;
> </TestTree2.g>
>