[antlr-interest] lexer problem

Robert Soule robert.soule at gmail.com
Mon Nov 3 07:59:48 PST 2008


Thank you for your help, but I'm still having some problems with the
suggested re-write:

start	
	: LSQUARE AB RSQUARE
	| A
	;
AB	:	('a'|'b')+;

fragment A: '[a]';
LSQUARE
 :  (A) => A { $type = A; }
 |  '['
 ;

RSQUARE	:	']';


When I test with inputs "[ba]" I pass, but "[ab]" fails.

thank you,
Robert


On Fri, Oct 31, 2008 at 9:40 PM, Gavin Lambert <antlr at mirality.co.nz> wrote:
> At 10:07 1/11/2008, Robert Soule wrote:
>>I was hoping someone might be able to help me out. I have the
>>following grammar:
>>
>>grammar Test;
>>start: '[' AB ']' | A;
>>A: '[a]';
>>AB: ('a' | 'b')+;
>>
>>In English, there is a keyword in my language '[a]', and
>>all other statements are of the form: [(a|b)+]. I tried this
>>with two test cases:
>>
>>test [ab] fails unexpectedly (no viable alternative)
>>test [ba] succeeds
>>
>>I believe that the lexer sees a '[' character followed by
>>an 'a' characters, and expects a ']' next, even though
>>'a' or 'b' could also be valid next input characters. Has
>>anyone had any experience with this type of issue?
>
> Yeah, this is a common prefix problem :)  (By which I both mean that it's a
> common problem and that it's a problem with common prefixes.)
>
> Essentially what you've got above are the following lexer rules:
>
> T15: '[';
> T16: ']';
> A: '[a]';
> AB: ('a' | 'b')+;
>
> To decide between these top-level alternatives, ANTLR essentially builds a
> least-lookahead disambiguation table.  With only one character of lookahead,
> it can instantly recognise the difference between T16, AB, and *either* of
> T15 and A, but it needs at least two characters to tell between T15 and A.
>  It never checks that third character, which is what it'd need to look at to
> decide between a single A vs. a T15 *followed by* an AB.
>
> To deal with this kind of problem, you need to manually force the necessary
> lookahead.  You can do this by combining the rules with common prefixes:
>
> fragment A: '[a]';
> LSQUARE: '[' ('a]' { $type = A; })? ;
>
> Another way of writing it:
>
> fragment A: '[a]';
> LSQUARE
>  :  (A) => A { $type = A; }
>  |  '['
>  ;
>
> (Either way, of course, you'll need to refer to LSQUARE in your parser rules
> after this.)
>
>


More information about the antlr-interest mailing list