[antlr-interest] lexer being fussy -why?

damarcus Leacock damalee5 at gmail.com
Wed May 16 10:30:05 PDT 2007


Hi,

I am working on a little language, using the lexer below. I was baffled to
notice that I got lexing errors when I use identifiers (in example programs
of that language) that have more than two characters in common with the
tokens that I defined. That is despite the fact that the identifiers-rule
has testLiterals set to true. So

var thau := ...

results in an error saying something like "found 'a' while expecting 'e' "
(presume it is matching the then token).

I can of course solve this by upping the k to
be higher than the length of any of my tokens, but I am a bit surprised that
this would be necessary. Why does the lexer feel the need to start screaming
when in this situation? Could anyone enlighten this naive soul?

thanx,
d.


----------------------
vardecl
  : VAR^ Identifier ASSIGN! expr
  ;

-------------------------
class MyLexer extends Lexer;

options {
k = 2;
charVocabulary = '\0'..'\377'; //unicode
testLiterals=false; // don't automatically test for literals
}


IF : "if";
THEN : "then";
ELSE : "else";
WHILE : "while";
DO : "do";
VAR : "var";
ASSIGN : ":=";
SKIP : "skip";
RETURN : "return";
TYPE : "type";
LAMBDA : "lambda";

COLON : ':';
SEMICOLON : ';';
COMMA : ',';
LPAREN : '(';
RPAREN : ')';
LCURLY : '{';
RCURLY : '}';
LSQBRACKET : '[';
RSQBRACKET : ']';
DOT : '.';
MU : '@';
ARROW : "->";
PLUS : '+';
MINUS : '-';
TIMES : '*';
DIV : '/';
MOD : '%';
CONCAT : '^';
EQUALS : "==";
LEQ : "<=";
EQUALSIGN : '=';


Identifier
options {testLiterals=true;}
: Letter (Letter|Digit)*
;

protected Letter //from java grammar
: '\u0024' |
'\u0041'..'\u005a' |
'\u005f' |
'\u0061'..'\u007a' |
'\u00c0'..'\u00d6' |
'\u00d8'..'\u00f6' |
'\u00f8'..'\u00ff' |
'\u0100'..'\u1fff' |
'\u3040'..'\u318f' |
'\u3300'..'\u337f' |
'\u3400'..'\u3d2d' |
'\u4e00'..'\u9fff' |
'\uf900'..'\ufaff'
;
protected Digit
: '0'..'9'
;

StringLiteral
: '"'! ( ~('"'|'\n'|'\r') )* '"'!
// : '"'! ( '\\''"' | ~('"'|'\n'|'\r') )* '"'!
;
NumberLiteral
: ('0' | '1'..'9' ('0'..'9')*)
;
BoolLiteral
: ( "true" | "false" )
;
Comment
: "//" (~('\n'|'\r'))*
{ $setType(Token.SKIP); }
;

WS
: ( ' '
| '\t'
| '\f'

// handle newlines
| ( "\r\n" // DOS/Windows
| '\r' // Macintosh
| '\n' // Unix
)
// increment the line count in the scanner
{ newline(); }
)
{ $setType(Token.SKIP); }
;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070516/6fa3d9f6/attachment-0001.html 


More information about the antlr-interest mailing list