[antlr-interest] Fundamental question on lexer rule ordering
Martin d'Anjou
martin.danjou at neterion.com
Tue Feb 27 12:06:37 PST 2007
Hi!
I must be missing something fundamental on lexer rule ordering, because I
keep running into the same problem over and over: re-ordering rules
changes the lexer from "working" to "failing", and I don't understand why.
Note, this is not the same question as my previous posting on fragment.
Take this input text, it's got multiline comment in it:
int id;
int int_id;
int _int_id;
/*
nothing
*/
45b32
6h87z
I have two lexers, one that work and one that fails. This one works:
lexer grammar DUMMY_Lexer;
INT : 'int' ;
SEMI : ';' ;
WS : ( ' '| '\t'| '\r' | '\n' )+ {$channel=HIDDEN;} ;
IDENTIFIER : ('a'..'z'|'A'..'Z'|'_')+;
NUMBER : DIGIT+ (BASE (DIGIT|'z'|'Z')+)? ;
ML_COMMENT : '/*' ( options {greedy=false;} : .)* '*/' {$channel=HIDDEN;} ;
fragment
BASE : 'b' | 'h';
fragment
DIGIT : '0'..'9';
This one does not work:
lexer grammar DUMMY_Lexer;
INT : 'int' ;
SEMI : ';' ;
WS : ( ' '| '\t'| '\r' | '\n' )+ {$channel=HIDDEN;} ;
ML_COMMENT : '/*' ( options {greedy=false;} : .)* '*/' {$channel=HIDDEN;} ;
IDENTIFIER : ('a'..'z'|'A'..'Z'|'_')+ ;
NUMBER : DIGIT+ (BASE (DIGIT|'z'|'Z')+)? ;
fragment
BASE : 'b' | 'h';
fragment
DIGIT : '0'..'9';
The only difference is ML_COMMENT is in a different position. I can
picture a machine consuming characters and trying to match tokens, but all
these tokens I'm lexing are very different and I don't understand how the
order could possibly matter in this case.
I'd really like to understand. I appologize if this is in the manual, I
must have missed it.
Thanks!
Martin
More information about the antlr-interest
mailing list