[antlr-interest] Why does syntactic predicate not take effect?

Mon Nov 10 15:16:42 PST 2008

Britta Kiera wrote:
> Don't know if the first reply was sent. Trying it a second time.
>
> > At 23:32 10/11/2008, Britta Kiera wrote:
> > >The lexer was supposed to generate a NAMES token for the feature
> > >name sequence. The definition below shows an approach that I tried
> > >to accomplish this. This approach didn't work. The lexer never
> > >generated a NAMES token although I tried
> > >to enforce this using a syntactic predicate. I solved this problem
> > >in the parser but I'd like to understand why the
> > >syntactic predicate does not take effect. Can somebody explain this
> > >to me?
> >
> > Are you using the interpreter or the debugger (or a "real" compiled
> > program)? Because the interpreter doesn't evaluate predicates.
>
> I'm not using ANTLRWorks. I'm using the ANTLR IDE Eclipse plugin with
> the ANTLR 3.1.1 runtime to generate the lexer code. Then I run the main
> method of the generated lexer using the Eclipse "Run" command. The main
> method of the grammar that I sent with my first mail contains a short test.
>
> > >NAMES:
> > >    ;
> >
> > You need to make this a fragment rule.  Otherwise you've got a top-level
> > lexer rule which can successfully match nothing at all, which is Bad.
> >  (Since that way lies infinite loops.)
>
> In the grammar below NAMES has been made a fragment. But still it produces
> the same output as before that doesn't contain a NAMES token:

actually I believe the prefered way is to specify this and any other Virtural 
Token in a tokens {} section - should appear after the options {} and before 
your @header {} section.

>
> Token:  WHITE(99) >   <
> Token:  IDENT( 0) >plugins<
> Token:  DOT( 0) >.<
> Token:  IDENT( 0) >navigation<
> Token:  DOT( 0) >.<
> Token:  IDENT( 0) >XRefs<
> Token:  WHITE(99) > <
> Token:  IDENT( 0) >Outline<
> Token:  WHITE(99) > <
> Token:  IDENT( 0) >GoTo<
> Token:  WHITE(99) > <

here is, i believe the problem, see comments near your IDENT rule below.

> Token:  LB( 0) >{<
> Token:  RB( 0) >}<
> Token:  -1( 0) >null<
>
> Regards,
> Nukiti
>
> ========================= modified ANTLR lexer start ======================
> lexer grammar SimpleLex;
>
> options {
>     language = Java;
> }
>
> @header {
> package test.antlr;
>
> import java.io.StringReader;
> }
>
> @members {
>     public static void main(String args[]) throws Exception {
>         String      input = "   plugins.navigation.XRefs Outline GoTo {}";
>         CharStream  cs    = new ANTLRStringStream(input);
>         SimpleLex   lex   = new SimpleLex(cs);
>        
>         Token t;
>         do {
>             String type = "?";
>             t=lex.nextToken();
>             switch(t.getType()) {
>                 case IDENT: type = "IDENT"; break;
>                 case NAMES: type = "NAMES"; break;
>                 case DOT  : type = "DOT"  ; break;
>                 case WHITE: type = "WHITE"; break;
>                 case LB   : type = "LB"   ; break;
>                 case RB   : type = "RB"   ; break;
>                 default   : type = Integer.toString(t.getType()); break;
>             }
>             System.out.printf("Token: %6s(%2d) >%s<\n", type,
> t.getChannel(), t.getText()); }
>         while(t.getType() != -1);
>     }
> }
>
> IDENT
>     : (ID (WS ID)+)=> ID (WS ID)+ {$type = NAMES;}

recall that ANTLR lexer constructs greedily consume the longest possible 
string, and that, further, once commited to a particular looping construct no 
other alternative is recognized. so your predicate wants a list of WS ID 
pairs. fine. good. gotcha. but your input has some WS ID pairs and then a WS 
LB pair - that does not match a (WS ID)+ and the predicate fails.

remove the blank before the { in your test data and see what I mean.

so try this ident rule instead:

IDENT
    : ID ( ((WS ID)=>(WS ID))+ {$type = NAMES;} )?
    ;

>     | ID
>     ;
>    
> WHITE
>     : WS { $channel = HIDDEN; }
>     ;
>
> LB  : '{' ;
> RB  : '}' ;
> DOT : '.' ;
>
> fragment NAMES :;
> fragment WS    : (' '|'\r'|'\t'|'\u000C'|'\n')+;
> fragment ID    : LETTER (LETTER|DIGIT)*;
> fragment DIGIT : '0'..'9';
> fragment LETTER: 'A'..'Z' | 'a'..'z' | '_';

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081110/d41c3b74/attachment.html