[antlr-interest] lexer problem

Hendrik Maryns qwizv9b02 at sneakemail.com
Mon Nov 3 07:08:46 PST 2008


Gavin Lambert schreef:
> At 10:07 1/11/2008, Robert Soule wrote:
>  >I was hoping someone might be able to help me out. I have the
>  >following grammar:
>  >
>  >grammar Test;
>  >start: '[' AB ']' | A;
>  >A: '[a]';
>  >AB: ('a' | 'b')+;
>  >
>  >In English, there is a keyword in my language '[a]', and
>  >all other statements are of the form: [(a|b)+]. I tried this
>  >with two test cases:
>  >
>  >test [ab] fails unexpectedly (no viable alternative)
>  >test [ba] succeeds
>  >
>  >I believe that the lexer sees a '[' character followed by
>  >an 'a' characters, and expects a ']' next, even though
>  >'a' or 'b' could also be valid next input characters. Has
>  >anyone had any experience with this type of issue?
> 
> Yeah, this is a common prefix problem :)  (By which I both mean 
> that it's a common problem and that it's a problem with common 
> prefixes.)
> 
> Essentially what you've got above are the following lexer rules:
> 
> T15: '[';
> T16: ']';
> A: '[a]';
> AB: ('a' | 'b')+;
> 
> To decide between these top-level alternatives, ANTLR essentially 
> builds a least-lookahead disambiguation table.  With only one 
> character of lookahead, it can instantly recognise the difference 
> between T16, AB, and *either* of T15 and A, but it needs at least 
> two characters to tell between T15 and A.  It never checks that 
> third character, which is what it'd need to look at to decide 
> between a single A vs. a T15 *followed by* an AB.
> 
> To deal with this kind of problem, you need to manually force the 
> necessary lookahead.  You can do this by combining the rules with 
> common prefixes:
> 
> fragment A: '[a]';
> LSQUARE: '[' ('a]' { $type = A; })? ;
> 
> Another way of writing it:
> 
> fragment A: '[a]';
> LSQUARE
>    :  (A) => A { $type = A; }
>    |  '['
>    ;
> 
> (Either way, of course, you'll need to refer to LSQUARE in your 
> parser rules after this.)

This looks promising to solve my problem as well (see other threads).
Could you explain further what the line
    :  (A) => A { $type = A; }
does?  I crawled the site for an explanation of ‘=>’ but couldn’t find
more than ‘semantic predicate’ and ‘always execute predicate’.

Thanks, H.
-- 
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 257 bytes
Desc: OpenPGP digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20081103/ec9d96bd/attachment.bin 


More information about the antlr-interest mailing list