[antlr-interest] question about lexer rules

Sat Dec 19 01:24:08 PST 2009

A question to lexer rules and its priorities. Is there any
dependency between order of lexer rule definitions?

Some time ago I had some trouble:

TIME    :       (DIGIT+'h' 
                |DIGIT+'m'
                |DIGIT+'s'
                |DIGIT+'h'+DIGIT+'m'
                |DIGIT+'h'+DIGIT+'m'+DIGIT+'s'
                |DIGIT+'m'+DIGIT+'s');
DIGIT           :       ('0'..'9');
NUMBER          :       DIGIT+; 
LOWERCASE       :       'a'..'z';
UPPERCASE       :       'A'..'Z';
IDENTIFIER_LOWER        :       (LOWERCASE|DIGIT|'_')+;
IDENTIFIER_UPPER        :       (UPPERCASE|DIGIT|'_')*;
NEWLINE         :       ('\r'|'\n'|'\r\n');
WS              :       (' '
                        |'\t'
                        |'\r''\n'
                        |'\n'
                        ) { skip(); };  
COMMENT         :       ('//' (~('\n'|'\r'))* NEWLINE+) {skip();};
QUOTE           :       '"';
STRING_LITERAL  :       '"' ('\u0020'|'\u0021'|'\u0023'..'\u007f')*  '"';

This worked for me. But when moving TIME to end of
defintions, antlrworks wasn't able to parse my examples.

My understanding of lexer rules is, the best rule will
match. The best rule is the rule matching the most
characters. But what about TIME and IDENTIFIER_LOWER? Both
may match the same input sequence.

regards,
markus