[antlr-interest] solution to lexer issue
Gavin Lambert
antlr at mirality.co.nz
Fri Oct 26 19:52:46 PDT 2007
At 14:15 27/10/2007, Terence Parr wrote:
>You only predict on valid input; I think errors are left to
>"if ! valid prediction"; i.e., the opposite of "good" must
>be "bad". No need to look for "bad" explicitly.
Yep; only once you've been through the whole list of tokens and
can't find anything to generate do you concede defeat, report an
error, and then maybe try error-recovery mechanisms like skipping
a character and trying again.
>I think a proof would abstract the issue to:
>
>r : x .* | .* ;
>
>where x is any alternative in any grammar. Static analysis
>involves using more and more lookahead until you can
>distinguish between alts. x and .* look identical by
>definition so you look past x hoping to distinguish.
>Ooops. .* follows. At this point, you concede defeat.
>Only solution is to predict with entire x; i.e., you
>must backtrack like lex does or using Idle Jim's syn pred
>solution ;)
Ah, I see what you meant now.
>Now, had I done this sketch of a proof earlier I would
>realize that assuming .* follows is overly general; well,
>actually, it's just wrong.
Given that alts *don't* have inherent precedence (or at least I
don't think they do), would it behave better if you tweaked it to
look more like this:
r : x .* | ~x .* ;
...? Not saying that's right either (because you still don't want
to consume erroneous input until you have no other choice), but it
might be easier to get to from what you've got at the moment.
>I *hope* the change doesn't mess up stuff like keyword
>vs ID. Could involve a lot of thought / testing on my
>part to verify I haven't screwed up something else.
Simple solution for that: write lots of unit tests :) (Especially
lexer-only tests. I've noticed a lot of the unit test snippets
you've posted in the past have tended to be combined tests, even
when using a trivial parser. Possibly that was just coincidental.)
More information about the antlr-interest
mailing list