[antlr-interest] Re: Lexer problem
manfredlotz
Manfred.Lotz at web.de
Sat Dec 29 01:05:45 PST 2001
--- In antlr-interest at y..., Terence Parr <parrt at j...> wrote:
>
> On Thursday, December 27, 2001, at 09:40 PM, manfredlotz wrote:
>
> > Using the definiton shown below the lexer has a problem to
recognize
> > tokens if they arrive in a certain order.
> >
> > E.g., ".a.a.b$" or ".ac$" work fine, however ".a.b$" does not.
> >
> > Any idea what I have to change to make it working. I know it has
to do
> > with the fact that ".a" is a substring of ".a.a" but I don't know
how
> > to improve the t.g file.
> >
> > Manfred
> > _____________________
> >
> > class TLexer extends Lexer;
> > options {
> > k=4;
> > filter=IGNORE;
> > }
> >
> > MYENDTOKEN : '$' ;
> >
> > A : ( ".a" | ".a.a" | ".b" | 'a' | 'c' ) ;
> >
> > protected
> > IGNORE : ( "\r\n" | '\r' | '\n' ) { _ttype = Token.SKIP; };
>
> Looks ok to me. Can you be more specific about what it says? Try
using
> -traceLexer on antlr cmd-line and then compile and run it. See
where
> it's going.
>
> Ter
> --
> Chief Scientist & Co-founder, http://www.jguru.com
> Creator, ANTLR Parser Generator: http://www.antlr.org
Ok, I ran the test with traceLexer on. Here is the output:
$ java TestLexer .a.b
> lexer mA; c==.
< lexer mA; c==b
> lexer mIGNORE; c==.
< lexer mIGNORE; c==.
line 1: unexpected char: .
> lexer mA; c==a
< lexer mA; c==.
Token: ["a",<5>,line=1,col=5]
> lexer mA; c==.
< lexer mA; c==$
Token: [".b",<5>,line=1,col=6]
> lexer mMYENDTOKEN; c==$
< lexer mMYENDTOKEN; c==?
Token: ["$",<4>,line=1,col=8]
The problem seems to be that after scanning ".a." it thinks it has to
be ".a.a". After finding a 'b' it should be able to go back to the dot
in order to recognize that now we have a ",b" which is a valid token.
Can I do something to help the lexer?
Manfred
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list