[antlr-interest] [newbie] Lexer Confusion

Sat Jul 5 08:26:35 PDT 2008

UW Student schrieb:
 > Johannes Luber wrote:
 >  > UW Student schrieb:
 >  >>>> I would really prefer to have a single token.  Is it possible to
 >  >>>> modify Johannes' version to handle that?
 >  >>  >
 >  >>> Try this:
 >  >>>
 >  >>> TERM1: '.' ( ('.')=> '.' {$type = TERM2;} )* ;
 >  >>
 >  >> Will that ensure that the number of DOTs consumed is even?  If I
 >  >> understand correctly, it will simply catch any sequence of more than
 >  >> one DOT.
 >  >>
 >  >> -Andrew
 >  >>
 >  >
 >  > No, it won't. Try this:
 >  >
 >  > TERM1: '.' ( ('.')=> '.' {$type = TERM2;} '..'* ) ;
 >  >
 >  > But I wonder: Do you really need to create such a rule for a 
particular
 >  > language? Doing some regex should be faster there anywhere.
 >  >
 >  > Johannes
 >  >
 >
 > Doesn't that have the original problem?  If there are three DOTs, then
 > it will fail with a mismatched token exception, won't it?
 >
 > The '...'+ tokens are filler (like whitespace or comments) in the
 > language I'm translating.  It would be much easier to look past them if
 > they were lumped together.
 >
 > I agree that a regex would be a good solution for matching this token. I
 > was hoping the Antlr lexer provided that kind of regex support.

If you want to treat '..' as filler, why don't you change the channel of 
the TERM1 and TERM2 tokens? That way the number of tokens is irrelevant 
(beyound some small increase of the memory footprint) and your grammar 
can ignore those tokens at later stage.
 >
 > Thanks,
 > Andrew
 >
 > p.s. Is this thread starting to clutter the mailing list?  At what point
 > is it appropriate to take it offline?
 >
As long it is about ANTLR you can use the mailing list.

Johannes