[antlr-interest] ANTLR equivalent of JavaCC Lexer behaviour?
Richard.Kennard at mail.thomson.com
Richard.Kennard at mail.thomson.com
Mon Mar 27 15:21:11 PST 2006
Dear All,
I am looking to migrate an existing grammar from JavaCC to ANTLR, but am having difficultly with the Lexer.
Specifically, my grammer is very 'English-y', and while JavaCC appears to employ (I'm guessing here) a rather forgiving 'longest match' Lexer, ANTLR warns me to specify an actual 'k=x' lookahead number. I have found this number needs to be pretty large (17) to stop the warning, at which point ANTLR seems to crash (and besides http://www.antlr.org/doc/options.html warns against it, saying 'at large depths will include almost everything').
Here is a snippet of my working JavaCC grammer...
PARSER_END( BusinessLanguage )
TOKEN :
{
< EQUALS: "is" | "is the same as" | "the same as" | "are" | "are the same as" | "of" >
| < NOT_EQUALS: "is not" | "is not the same as" >
| < LESS_THAN: "is less than" >
| < IDENTIFIER: <LETTER> (<LETTER>|<DIGIT>)* >
}
...and the sort of thing it parses...
if Status is "Closed" then error "Already closed"
if Version is less than 1 then error "Version cannot be less than 1"
...and here is what I tried in ANTLR...
class BusinessLexer extends Lexer;
options
{
k=17;
}
EQUALS: "is" | "is the same as" | "the same as" | "are" | "are the same as" | "of";
NOT_EQUALS: "is not" | "is not the same as";
LESS_THAN: "is less than";
IDENTIFIER: ('a'..'z'|'A'..'Z'|'_'|'$') ('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*;
Clearly there is a lot of contention in this grammer, but is there a way to get the equvialent JavaCC behaviour? I would rather not have to code something along the lines of...
("is" ("not" | "less than")) | ("are" ( "the same as" ))
Your wisdom is most appreciated :)
Regards,
Richard.
More information about the antlr-interest
mailing list