[antlr-interest] lexer's rule overlapping problem

Sun Sep 6 10:19:36 PDT 2009

Непонятна Неизвесто wrote:
> Hello,
> 
> I would like to make my own translator but has faced with one problem - lexer's rules overlapping. I want to have two lexer rules like this:
> 
> ID 'a'..'z'+;
> ENG 'a'..'z';
> 
> and
> 
> INT '0'..'9'+;
> DIG '0'..'9;
...
> word:
> 	'inputword' ':' (ID|INT) 'in' '['axsis']' ';'
> 	;                 //^ it is important that here was placed only one character
> 
> and then find input errors (like this 123 in [1,2,3]) by the tree grammar

The problem with your thinking is that the lexing (dividing into tokens) 
all happens before the parser gets to see anything. Thus the lexer has 
no knowledge of the parse context (because there is no parse context 
yet), so when it sees

abcdef

It doesn't know whether it should be 6 ENG tokens or one ID token.

If you have a situation where you want to ensure a token is only one 
character long, you can (using predicates), but if you want to recognise 
123 as an INT in one context and three DIGs in another, you have a 
problem. One solution would be to make INT and ID into parser rules, int 
and id, formed as DIG+ and ENG+ respectively. That would leave a messy 
tree, though.

-- 
Sam Barnett-Cormack