[antlr-interest] how to set lookahead in v3
Johannes Luber
jaluber at gmx.de
Fri May 4 04:17:27 PDT 2007
Markus Kuhla wrote:
> Hi Johannes, Hi all,
>
> What I expected from the LL(*) was that it can also decide whether to go out of the current rule - even if the next (two) token match an alternative of the current rule.
>
> I give you more details, because your proposal does not work. The right alternative is much higher in the tree.
>
> side : (section)+ EOF;
> section : blanks? (separator | textsection) NEWLINE;
> separator : DASH DASH blanks? NEWLINE
> textsection : (textline_part)+;
> textline_part : '/*' commentline+ ('*/')?;
> commentline : NEWLINE blanks? any_char_not_dash
>
> input = 'text /* COMMENT\n --\n NOTCOMMENT'
>
> So the parser has reached the point to decide whether to continue with a second commentline (could fit if he considers NEWLINE blanks? only), but he should recognize the dashes. Then he should end the commentline ()+ loop, go back to section and decide that a separator is the next!
>
> Do you know what I mean? I hope you can give me a good hint.
>
> Thank you all for your great work here!
> Markus
I've created the following grammar from your snippet:
side : section+ EOF;
section : BLANKS? (separator | textsection) NEWLINE;
separator : DASH DASH BLANKS? NEWLINE;
textsection : textline_part+;
textline_part : '/*' commentline+ '*/'?;
commentline : NEWLINE BLANKS? ~(DASH | NEWLINE | BLANKS);
BLANKS: (' ' | '\t')+ ;
NEWLINE: ('\r' '\n'?| '\n');
DASH: '-';
Note that I turned any_char_not_dash to include no NEWLINES and BLANKS
to remove an ambiguity. This shouldn't affect the recognition
capabilities. Nonetheless there is still one ambiguity remaining:
"NEWLINE BLANKS? /* */" can be matched by commentline or by two
following section tokens. The problem is that the comment of
textline_part has an optional '*/'. Removal of the ? clears things up,
but changes the recognized language. The reason of this behaviour may be
that you don't give us the entire grammar file. As I know that you can't
do that, my advise is to look at the C# grammar specification in ECMA
334 standard, how they implemented the multiline comments.
Best regards,
Johannes Luber
More information about the antlr-interest
mailing list