[antlr-interest] Why does syntactic predicate not take effect?
John B. Brodie
jbb at acm.org
Mon Nov 10 15:16:42 PST 2008
Britta Kiera wrote:
> Don't know if the first reply was sent. Trying it a second time.
>
> > At 23:32 10/11/2008, Britta Kiera wrote:
> > >The lexer was supposed to generate a NAMES token for the feature
> > >name sequence. The definition below shows an approach that I tried
> > >to accomplish this. This approach didn't work. The lexer never
> > >generated a NAMES token although I tried
> > >to enforce this using a syntactic predicate. I solved this problem
> > >in the parser but I'd like to understand why the
> > >syntactic predicate does not take effect. Can somebody explain this
> > >to me?
> >
> > Are you using the interpreter or the debugger (or a "real" compiled
> > program)? Because the interpreter doesn't evaluate predicates.
>
> I'm not using ANTLRWorks. I'm using the ANTLR IDE Eclipse plugin with
> the ANTLR 3.1.1 runtime to generate the lexer code. Then I run the main
> method of the generated lexer using the Eclipse "Run" command. The main
> method of the grammar that I sent with my first mail contains a short test.
>
> > >NAMES:
> > > ;
> >
> > You need to make this a fragment rule. Otherwise you've got a top-level
> > lexer rule which can successfully match nothing at all, which is Bad.
> > (Since that way lies infinite loops.)
>
> In the grammar below NAMES has been made a fragment. But still it produces
> the same output as before that doesn't contain a NAMES token:
actually I believe the prefered way is to specify this and any other Virtural
Token in a tokens {} section - should appear after the options {} and before
your @header {} section.
>
> Token: WHITE(99) > <
> Token: IDENT( 0) >plugins<
> Token: DOT( 0) >.<
> Token: IDENT( 0) >navigation<
> Token: DOT( 0) >.<
> Token: IDENT( 0) >XRefs<
> Token: WHITE(99) > <
> Token: IDENT( 0) >Outline<
> Token: WHITE(99) > <
> Token: IDENT( 0) >GoTo<
> Token: WHITE(99) > <
here is, i believe the problem, see comments near your IDENT rule below.
> Token: LB( 0) >{<
> Token: RB( 0) >}<
> Token: -1( 0) >null<
>
> Regards,
> Nukiti
>
> ========================= modified ANTLR lexer start ======================
> lexer grammar SimpleLex;
>
> options {
> language = Java;
> }
>
> @header {
> package test.antlr;
>
> import java.io.StringReader;
> }
>
> @members {
> public static void main(String args[]) throws Exception {
> String input = " plugins.navigation.XRefs Outline GoTo {}";
> CharStream cs = new ANTLRStringStream(input);
> SimpleLex lex = new SimpleLex(cs);
>
> Token t;
> do {
> String type = "?";
> t=lex.nextToken();
> switch(t.getType()) {
> case IDENT: type = "IDENT"; break;
> case NAMES: type = "NAMES"; break;
> case DOT : type = "DOT" ; break;
> case WHITE: type = "WHITE"; break;
> case LB : type = "LB" ; break;
> case RB : type = "RB" ; break;
> default : type = Integer.toString(t.getType()); break;
> }
> System.out.printf("Token: %6s(%2d) >%s<\n", type,
> t.getChannel(), t.getText()); }
> while(t.getType() != -1);
> }
> }
>
> IDENT
> : (ID (WS ID)+)=> ID (WS ID)+ {$type = NAMES;}
recall that ANTLR lexer constructs greedily consume the longest possible
string, and that, further, once commited to a particular looping construct no
other alternative is recognized. so your predicate wants a list of WS ID
pairs. fine. good. gotcha. but your input has some WS ID pairs and then a WS
LB pair - that does not match a (WS ID)+ and the predicate fails.
remove the blank before the { in your test data and see what I mean.
so try this ident rule instead:
IDENT
: ID ( ((WS ID)=>(WS ID))+ {$type = NAMES;} )?
;
> | ID
> ;
>
> WHITE
> : WS { $channel = HIDDEN; }
> ;
>
> LB : '{' ;
> RB : '}' ;
> DOT : '.' ;
>
> fragment NAMES :;
> fragment WS : (' '|'\r'|'\t'|'\u000C'|'\n')+;
> fragment ID : LETTER (LETTER|DIGIT)*;
> fragment DIGIT : '0'..'9';
> fragment LETTER: 'A'..'Z' | 'a'..'z' | '_';
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081110/d41c3b74/attachment.html
More information about the antlr-interest
mailing list