[antlr-interest] Look-ahead problem parsing phrase?

Sean O'Dell sean at celsoft.com
Tue Jun 30 07:36:16 PDT 2009


Still helpful information.  I did get past my issues, though ... the smoke
in my eyes is blowing away gradually.

Sean

On Tue, Jun 30, 2009 at 6:26 AM, Dave Dutcher <dave at tridecap.com> wrote:

> >From: Sean O'Dell
> >Subject: Re: [antlr-interest] Look-ahead problem parsing phrase?
> >
> >Thanks ... that really did help. I think I didn't realize how much better
> the parser is than the lexer at
> >looking-ahead. It makes much more sense to me now, though I'm not yet sure
> how I will deal with tokenizing
> >optional trailing whitespace.
> >
> >I think, though, if I understand correctly: the lexer rule I build to
> consume that should not be allowed
> >to be empty. However, if it's optional, I should indicate that in a parser
> rule and not the lexer rule.
> >
> >Maybe another way to say this is (and maybe it's been said, but I didn't
> "hear" it correctly): lexer rules
> >should strive to be completely unambiguous and should match something,
> preferably immediately from the
> >left. Parser rules can have more complex look-ahead patterns.
>
>
> The lexer is really just as powerful as the parse, but the big difference
> is
> that Antlr will chose which lexer token to start with where with a grammar
> you specify the starting token.
>
> You originally posted this lexer grammar:
>
>    WS : (' '|'\t')+;
>    DIGIT : ('0'..'9');
>    LETTER : ('a'..'z'|'A'..'Z');
>    NEWLINE : '\r'? '\n';
>    WORD : (LETTER|DIGIT)+;
>    EOL : WS? NEWLINE?;
>    PHRASE : WORD (WS WORD)*;
>
> What you have to remember is that the lexer runs first and completely
> tokenizes the character stream before the parser runs.  Also Antlr decides
> which lexer token to match.  The way I think of it is that basically Antlr
> adds another lexer rule that looks like this:
>
> START :
>  (WS | DIGIT | LETTER | NEWLINE | WORD | EOL | PHRASE);
>
> Now you can see how having to chose between PHRASE and WORD could be
> tricky.
> If you a rule is marked as a fragment, it won't be included in the "START"
> rule.
>
> That said, this might not have been your problem, and Jim's solution might
> be all you need.
>
> Dave
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090630/f5f7e690/attachment.html 


More information about the antlr-interest mailing list