[antlr-interest] Re: Lexer problem

Sat Dec 29 01:05:45 PST 2001

--- In antlr-interest at y..., Terence Parr <parrt at j...> wrote:
> 
> On Thursday, December 27, 2001, at 09:40  PM, manfredlotz wrote:
> 
> > Using the definiton shown below the lexer has a problem to
recognize
> > tokens if they arrive in a certain order.
> >
> > E.g.,  ".a.a.b$" or ".ac$" work fine, however ".a.b$" does not.
> >
> > Any idea what I have to change to make it working. I know it has
to do
> > with the fact that ".a" is a substring of ".a.a" but I don't know
how
> > to improve the t.g file.
> >
> > Manfred
> > _____________________
> >
> > class TLexer extends Lexer;
> > options {
> >         k=4;
> >         filter=IGNORE;
> > }
> >
> > MYENDTOKEN   : '$' ;
> >
> > A       :   ( ".a" | ".a.a" | ".b" | 'a' | 'c' )  ;
> >
> > protected
> > IGNORE  : ( "\r\n" | '\r' | '\n'  ) { _ttype = Token.SKIP; };
> 
> Looks ok to me.  Can you be more specific about what it says?  Try
using 
> -traceLexer on antlr cmd-line and then compile and run it.  See
where 
> it's going.
> 
> Ter
> --
> Chief Scientist & Co-founder, http://www.jguru.com
> Creator, ANTLR Parser Generator: http://www.antlr.org

Ok, I ran the test with traceLexer on. Here is the output:

$ java TestLexer .a.b

 > lexer mA; c==.
 < lexer mA; c==b
 > lexer mIGNORE; c==.
 < lexer mIGNORE; c==.
line 1: unexpected char: .
 > lexer mA; c==a
 < lexer mA; c==.
Token: ["a",<5>,line=1,col=5]
 > lexer mA; c==.
 < lexer mA; c==$
Token: [".b",<5>,line=1,col=6]
 > lexer mMYENDTOKEN; c==$
 < lexer mMYENDTOKEN; c==?
Token: ["$",<4>,line=1,col=8]

The problem seems to be that after scanning ".a." it thinks it has to
be ".a.a". After finding a 'b' it should be able to go back to the dot
in order to recognize that now we have a ",b" which is a valid token.

Can I do something to help the lexer?

Manfred

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/