[antlr-interest] lexer problem
Hendrik Maryns
qwizv9b02 at sneakemail.com
Mon Nov 3 07:08:46 PST 2008
Gavin Lambert schreef:
> At 10:07 1/11/2008, Robert Soule wrote:
> >I was hoping someone might be able to help me out. I have the
> >following grammar:
> >
> >grammar Test;
> >start: '[' AB ']' | A;
> >A: '[a]';
> >AB: ('a' | 'b')+;
> >
> >In English, there is a keyword in my language '[a]', and
> >all other statements are of the form: [(a|b)+]. I tried this
> >with two test cases:
> >
> >test [ab] fails unexpectedly (no viable alternative)
> >test [ba] succeeds
> >
> >I believe that the lexer sees a '[' character followed by
> >an 'a' characters, and expects a ']' next, even though
> >'a' or 'b' could also be valid next input characters. Has
> >anyone had any experience with this type of issue?
>
> Yeah, this is a common prefix problem :) (By which I both mean
> that it's a common problem and that it's a problem with common
> prefixes.)
>
> Essentially what you've got above are the following lexer rules:
>
> T15: '[';
> T16: ']';
> A: '[a]';
> AB: ('a' | 'b')+;
>
> To decide between these top-level alternatives, ANTLR essentially
> builds a least-lookahead disambiguation table. With only one
> character of lookahead, it can instantly recognise the difference
> between T16, AB, and *either* of T15 and A, but it needs at least
> two characters to tell between T15 and A. It never checks that
> third character, which is what it'd need to look at to decide
> between a single A vs. a T15 *followed by* an AB.
>
> To deal with this kind of problem, you need to manually force the
> necessary lookahead. You can do this by combining the rules with
> common prefixes:
>
> fragment A: '[a]';
> LSQUARE: '[' ('a]' { $type = A; })? ;
>
> Another way of writing it:
>
> fragment A: '[a]';
> LSQUARE
> : (A) => A { $type = A; }
> | '['
> ;
>
> (Either way, of course, you'll need to refer to LSQUARE in your
> parser rules after this.)
This looks promising to solve my problem as well (see other threads).
Could you explain further what the line
: (A) => A { $type = A; }
does? I crawled the site for an explanation of ‘=>’ but couldn’t find
more than ‘semantic predicate’ and ‘always execute predicate’.
Thanks, H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 257 bytes
Desc: OpenPGP digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20081103/ec9d96bd/attachment.bin
More information about the antlr-interest
mailing list