[antlr-interest] lisp-like issues

Brannon King BBKing at starbridgesystems.com
Mon Dec 19 15:26:16 PST 2005


Suppose I have a grammar like the following:
 
(funcName returnName param1 param2 (funcName ...) ...)
 
(The actual grammar is EDIF and it is LISP-like).
 
I set all my built-in funcNames as tokens in the Lexer. The Lexer
Identifier rule picks up funcName, retName, param, etc. However, there
is no restriction in the language that the returnName or params be
different from any of the built-in function names. Function calls just
happen to be the first item in the list. So when I tell the Identifier
rule to testLiterals, I get errors when my other params have the same
name as a function. When I tell it to not test literals, I get an error
on the very first function that looks like this: expecting "funcName",
found 'funcName'. I don't know what is signified by the double quotes
vs. the single quotes on that error. Can someone explain that to me?
After that point I tried exporting the Parser's vocab to the Lexer. That
seemed to get rid of the above error. I wished I knew why. However, it
created an error that shows up when I try to print the AST tree with
toStringTree that looks like: expecting FUNCNAME, found 'funcName'. 

What I don't understand is 
1. why I cannot or how I should do a testLiterals in the parser instead
of the Lexer
2. why I would need to export the vocabularies under any circumstance
where I have all the tokens already listed in the Lexer's token section
3. what the advantages are to having the tokens in the lexer
4. why the toStringTree cannot find EDIF (am I supposed to importVocab
into that somehow?)
5. how I could test for literals only when there was an immediately
preceding open paren
5. should I be including the opening paren and WS as part of the
keywords?
6. with a simple language like LISP where the entirety of the hierarchy
is hardcoded into the parentheses, am I even on the right track with
code like the following?

Here is some of the code:

class EDIFParser extends Parser;
options
{
	k = 2;
	buildAST = true;
	exportVocab = EDIF;
}
edif :
	LP! EDIF^ edifFileNameDef
	RP!;
edifFileNameDef :
	IDENTIFIER | name | rename;
name :
	LP! NAME^ IDENTIFIER ( display )* RP!;
...

class EDIFLexer extends Lexer;
options
{
	k = 2; //2 for the double char newline
	testLiterals = false;
	charVocabulary = '\3'..'\377'; //ascii
	caseSensitiveLiterals = false;
	caseSensitive = false;
	importVocab = EDIF;		// Call its vocabulary "EDIF"
}
tokens
{
	EDIF="edif";
}

IDENTIFIER
	options {testLiterals=false;}
	:	(ALPHA|'&')	a:(options {greedy=true;}:
'_'|ALPHA|DIGIT)*
		{(a == null)? true: (a.getText().length() < 256)}? ; //
256 max chars
protected DIGIT: ('0'..'9');
protected ALPHA: ('a'..'z');
WHITESPACE	: ( ' ' | '\t' | '\f'
		| ( options { generateAmbigWarnings=false; }
					: "\r\n"  // DOS/Windows
					| '\r'	// Macintosh
					| '\n'	// UNIX
					) { newline(); }
		)
		{ $setType(Token.SKIP); }
		;
LP		: "(" ;
RP		: ")" ;

Thanks again for your time.


More information about the antlr-interest mailing list