[antlr-interest] ANTLR equivalent of JavaCC Lexer behaviour?

Mon Mar 27 15:21:11 PST 2006

Dear All,

I am looking to migrate an existing grammar from JavaCC to ANTLR, but am having difficultly with the Lexer.

Specifically, my grammer is very 'English-y', and while JavaCC appears to employ (I'm guessing here) a rather forgiving 'longest match' Lexer, ANTLR warns me to specify an actual 'k=x' lookahead number. I have found this number needs to be pretty large (17) to stop the warning, at which point ANTLR seems to crash (and besides http://www.antlr.org/doc/options.html warns against it, saying 'at large depths will include almost everything').

Here is a snippet of my working JavaCC grammer...

	PARSER_END( BusinessLanguage )

	TOKEN :
	{
		< EQUALS: "is" | "is the same as" | "the same as" | "are" | "are the same as" | "of" >
	|	< NOT_EQUALS: "is not" | "is not the same as" >
	|	< LESS_THAN: "is less than" >
	|	< IDENTIFIER: <LETTER> (<LETTER>|<DIGIT>)* >
	}  	

...and the sort of thing it parses...

	if Status is "Closed" then error "Already closed"
	if Version is less than 1 then error "Version cannot be less than 1"

...and here is what I tried in ANTLR...

	class BusinessLexer extends Lexer;

	options
	{
		k=17;
	}

	EQUALS: "is" | "is the same as" | "the same as" | "are" | "are the same as" | "of";
	NOT_EQUALS: "is not" | "is not the same as";
	LESS_THAN: "is less than";
	IDENTIFIER: ('a'..'z'|'A'..'Z'|'_'|'$') ('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*;

Clearly there is a lot of contention in this grammer, but is there a way to get the equvialent JavaCC behaviour? I would rather not have to code something along the lines of...

	("is" ("not" | "less than")) | ("are" ( "the same as" ))

Your wisdom is most appreciated :)

Regards,

Richard.