[antlr-interest] More on Lexer 2-char seq handling
David-Sarah Hopwood
david-sarah at jacaranda.org
Mon Oct 12 19:17:05 PDT 2009
Graham Wideman wrote:
> Hi folks:
>
> Further to the discussion on lexer matching sequence that should stop before some multi-character pattern:
>
> I read Kirby's post with interest, including the list discussions pointed to. I'm not sure what to make of it. The oddity to me is that ANTLR *almost* generates the right things:
>
> 1. mTokens does the right thing.
>
> 2. The lexer rule code that matches/consumes the string in question does look ahead and see the error it would make if it consumed the end-before-this pattern.
>
> 3. ANTLR just doesn't generate the code to look ahead and *predict* that it should *stop*, it only looks ahead enough to predict which alternative *might* succeed based on the first character.
Yes, I get the impression that ANTLR lexers use a weaker recognition
strategy than ANTLR parsers. The problems seem to occur when you try to do
something in a lexer, that would work in a parser (with tokens in place of
characters) only because of the stronger recognition strategy.
However, I haven't been able to find documentation of what exactly the
difference is -- the description of LL(*) in the 'Definitive Guide'
chapter 11 does not seem to make a distinction between lexing and parsing.
(It does say that ANTLR does not generate ambiguity warnings for a lexer
that it would generate for a parser, instead preferring rules that are
specified first in the grammar. But that doesn't seem to be relevant to
this lookahead issue.)
--
David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com
More information about the antlr-interest
mailing list