[antlr-interest] lexer's rule overlapping problem
Sam Barnett-Cormack
s.barnett-cormack at lancaster.ac.uk
Sun Sep 6 10:19:36 PDT 2009
Непонятна Неизвесто wrote:
> Hello,
>
> I would like to make my own translator but has faced with one problem - lexer's rules overlapping. I want to have two lexer rules like this:
>
> ID 'a'..'z'+;
> ENG 'a'..'z';
>
> and
>
> INT '0'..'9'+;
> DIG '0'..'9;
...
> word:
> 'inputword' ':' (ID|INT) 'in' '['axsis']' ';'
> ; //^ it is important that here was placed only one character
>
> and then find input errors (like this 123 in [1,2,3]) by the tree grammar
The problem with your thinking is that the lexing (dividing into tokens)
all happens before the parser gets to see anything. Thus the lexer has
no knowledge of the parse context (because there is no parse context
yet), so when it sees
abcdef
It doesn't know whether it should be 6 ENG tokens or one ID token.
If you have a situation where you want to ensure a token is only one
character long, you can (using predicates), but if you want to recognise
123 as an INT in one context and three DIGs in another, you have a
problem. One solution would be to make INT and ID into parser rules, int
and id, formed as DIG+ and ENG+ respectively. That would leave a messy
tree, though.
--
Sam Barnett-Cormack
More information about the antlr-interest
mailing list