[antlr-interest] Look-ahead problem parsing phrase?

Dave Dutcher dave at tridecap.com
Tue Jun 30 06:26:18 PDT 2009


>From: Sean O'Dell
>Subject: Re: [antlr-interest] Look-ahead problem parsing phrase?
>	
>Thanks ... that really did help. I think I didn't realize how much better
the parser is than the lexer at 
>looking-ahead. It makes much more sense to me now, though I'm not yet sure
how I will deal with tokenizing 
>optional trailing whitespace.
>	
>I think, though, if I understand correctly: the lexer rule I build to
consume that should not be allowed 
>to be empty. However, if it's optional, I should indicate that in a parser
rule and not the lexer rule.
>	
>Maybe another way to say this is (and maybe it's been said, but I didn't
"hear" it correctly): lexer rules 
>should strive to be completely unambiguous and should match something,
preferably immediately from the 
>left. Parser rules can have more complex look-ahead patterns.


The lexer is really just as powerful as the parse, but the big difference is
that Antlr will chose which lexer token to start with where with a grammar
you specify the starting token.

You originally posted this lexer grammar:

    WS : (' '|'\t')+;
    DIGIT : ('0'..'9');
    LETTER : ('a'..'z'|'A'..'Z'); 
    NEWLINE : '\r'? '\n';
    WORD : (LETTER|DIGIT)+;
    EOL : WS? NEWLINE?;
    PHRASE : WORD (WS WORD)*;

What you have to remember is that the lexer runs first and completely
tokenizes the character stream before the parser runs.  Also Antlr decides
which lexer token to match.  The way I think of it is that basically Antlr
adds another lexer rule that looks like this:

START :
  (WS | DIGIT | LETTER | NEWLINE | WORD | EOL | PHRASE);

Now you can see how having to chose between PHRASE and WORD could be tricky.
If you a rule is marked as a fragment, it won't be included in the "START"
rule.  

That said, this might not have been your problem, and Jim's solution might
be all you need.

Dave





More information about the antlr-interest mailing list