[antlr-interest] lexer's rule overlapping problem

Sun Sep 6 10:19:34 PDT 2009

Непонятна Неизвесто wrote:
> Hello,
>
> I would like to make my own translator but has faced with one problem - lexer's rules overlapping. I want to have two lexer rules like this:
>
> ID 'a'..'z'+;
> ENG 'a'..'z';
>
> and
>
> INT '0'..'9'+;
> DIG '0'..'9;
>
> I need it because it is necessary to catch input text errors while parsing.
>
> The purpose of these lexer rules is demonstrated below:
>
> word:
> 	'inputword' ':' (ENG|DIG) 'in' '['axsis']' ';'
> 	;
> axis:  
>   	INT (',' INT)*
> 	;
>
> I have find only one solution of this problem
>
> word:
> 	'inputword' ':' (ID|INT) 'in' '['axsis']' ';'
> 	;                 //^ it is important that here was placed only one character
>
> and then find input errors (like this 123 in [1,2,3]) by the tree grammar
>
> Thank you,
Read the getting started articles on the Wiki and if you can buy a copy 
of the ANTLR book. You cannot do what you are trying to do like that. 
Lexer tokens must be unique and their generation is not controlled by 
the parser. Besides which, the general rule is to delay error messages 
until as late as possible (so generally not the lexer). In this case, 
just accept any number of digits, then check the length of the token 
text matches for each side of your 'in' - if not then you can issue a 
much more useful error message about type mismatching or something similar.

Jim