[antlr-interest] v3 lexer cannot tell keyword from identifier (very strange)
Martin d'Anjou
martin.danjou at neterion.com
Thu Feb 22 13:30:35 PST 2007
Hi,
I have a very strange problem in 3.0b6. Given the input text:
int id;
int int_id;
The error:
line 2:4 mismatched input 'int' expecting IDENTIFIER
It is mistaking "int_id" for "int", treating the underscore as a token
separator. The (ridiculous looking) lexer is:
lexer grammar DUMMY_Lexer;
options { filter=true; }
MOD : 'mod' ;
END : 'end' ;
DEF : 'def' ;
INC : 'inc' ;
PAR : 'par' ;
INP : 'inp' ;
OUT : 'out' ;
INO : 'ino' ;
INT : 'int' ;
WER : 'wer' ;
COMMA : ',' ;
SEMI : ';' ;
L_PAREN : '(' ;
R_PAREN : ')' ;
ASSIGN : '=' ;
SHARP : '#' ;
LSHIFT : '<<' ;
MULT : '*' ;
MINUS : '-' ;
PLUS : '+' ;
COLON : ':' ;
LTEQ : '<=' ;
L_CURLY : '{' ;
R_CURLY : '}' ;
OR : '|' ;
SQUARE : '[]' ;
QUOTE : '"' ;
DIGIT : '0' ;
WS : ( ' ' | EOL )+ {$channel=HIDDEN;} ;
EOL : ('\r\n'|'\r'|'\n') ;
LetterC : 'c' | Nothing ;
Nothing : 't' ;
SL_COMMENT :'a';
ML_COMMENT : '/' ;
BASE : 'b' ;
BASE_NUM : DIGIT+ (BASE DIGIT+)? ;
IDENTIFIER : ('a'..'z'|UNDERSCORE)+ ;
fragment
UNDERSCORE : '_' ;
The only token I was able to get out was the QUESTION : '?'; token. When I
remove any other token (like MOD or other), the error changes to:
line 1:0 required (...)+ loop did not match anything at input 'int'
Which makes it even weirder...
Now the parser is fairly minimal:
parser grammar DUMMY_Parser;
options {
tokenVocab=DUMMY_Lexer;
}
source_text :
int_defs+
;
int_defs :
INT { System.out.print("int "); }
id=IDENTIFIER { System.out.print($id.text); }
SEMI { System.out.println(";"); }
;
Help!!! (and thanks!)
Martin
More information about the antlr-interest
mailing list