[antlr-interest] how to set lookahead in v3

Johannes Luber jaluber at gmx.de
Fri May 4 04:17:27 PDT 2007

Markus Kuhla wrote:
> Hi Johannes, Hi all,
> What I expected from the LL(*) was that it can also decide whether to go out of the current rule - even if the next (two) token match an alternative of the current rule.
> I give you more details, because your proposal does not work. The right alternative is much higher in the tree.
> side           : (section)+  EOF;
> section        : blanks?  (separator | textsection) NEWLINE;
> separator      : DASH  DASH  blanks?  NEWLINE
> textsection    : (textline_part)+;
> textline_part  : '/*'  commentline+  ('*/')?;
> commentline    : NEWLINE  blanks?  any_char_not_dash
> input = 'text /* COMMENT\n  --\n NOTCOMMENT'
> So the parser has reached the point to decide whether to continue with a second commentline (could fit if he considers NEWLINE blanks? only), but he should recognize the dashes. Then he should end the commentline ()+ loop, go back to section and decide that a separator is the next!
> Do you know what I mean? I hope you can give me a good hint.
> Thank you all for your great work here!
> Markus

I've created the following grammar from your snippet:

side           : section+  EOF;
section        : BLANKS?  (separator | textsection) NEWLINE;
separator      : DASH  DASH  BLANKS?  NEWLINE;
textsection    : textline_part+;
textline_part  : '/*' commentline+ '*/'?;
commentline    : NEWLINE  BLANKS?  ~(DASH | NEWLINE | BLANKS);

BLANKS: (' ' | '\t')+ ;
NEWLINE: ('\r' '\n'?| '\n');
DASH: '-';

Note that I turned any_char_not_dash to include no NEWLINES and BLANKS
to remove an ambiguity. This shouldn't affect the recognition
capabilities. Nonetheless there is still one ambiguity remaining:
"NEWLINE BLANKS? /* */" can be matched by commentline or by two
following section tokens. The problem is that the comment of
textline_part has an optional '*/'. Removal of the ? clears things up,
but changes the recognized language. The reason of this behaviour may be
that you don't give us the entire grammar file. As I know that you can't
do that, my advise is to look at the C# grammar specification in ECMA
334 standard, how they implemented the multiline comments.

Best regards,
Johannes Luber

More information about the antlr-interest mailing list