[antlr-interest] [Antlr 3] lexer problem

Tue May 15 03:37:34 PDT 2007

> Yes, I know. But I'd expect that the lexer tracks back when it can
> not complete the optional ('.' DIGIT)? part.

For this to happen lexer should know what the parser expects. And I do not
think that is how lexer works. It knows 0 about parser. The parser knows 0
about lexer it only cares about token stream. In my understanding tracking
back can happen only in either one of them. But not across their boundaries
(unless you do it yourself of course)

So in your case you could disambiguate your lexer by using syntactic predicates
I guess. I have never used  them on lexers but I guess if it works
similarly to how they
work in parsers, it should look something like this:

INTORFLOAT:
  ('0'..'9'+ '.' '0'..'9') => FLOAT
 | INT;

FLOAT  : '0'..'9'+ ('.' '0'..'9'+)?;
INT       : '0'..'9'+;

> So it just consumes 42 (because it is a valid FLOAT, too).

Lexer is a greedy beast it will produce a token whenever it sees the
biggest number
of possible matching char sequences that can be represented as a token. In your
case lexer gets stuck after '.'; to help it you have to be more
specific - hence the
syntactic predicates.

> The parser behaves like this, but the lexer not.
> I'd expected that the following two grammars would successfully parse
> '42.foo'
>
> // float as lexer rule
> start : FLOAT ('.' foo)?;
> FLOAT : DIGIT ('.' DIGIT)?;
> DIGIT : '0'..'9'+;
>
> // float as parser rule
> start : float ('.' foo)?;
> float : DIGIT ('.' DIGIT)?;
> DIGIT : '0'..'9'+;
>
> Only the second one works...