[antlr-interest] Trouble with ANTLR 3 grammar

Emond Papegaaij e.papegaaij at student.utwente.nl
Tue Jul 4 00:41:42 PDT 2006


On Tuesday 04 July 2006 01:39, Terence Parr wrote:
> On Jul 1, 2006, at 5:26 AM, Emond Papegaaij wrote:
> > The lexer does hit 'channel=99', but only after the token is
> > already emitted.
> > Printing the channel inside the loop in Main shows '0'. For every
> > WS token
> > mWS is called twice. It seems that on the first call the token is
> > emitted,
> > and on the second call the channel is set. I can't explain why.
> > Here is some
> > output with println statements added at the start of the method, at
> > the 'channel=99' statement and at the 'emit' statement:
> >
> > inWS
> > emit(11,1,9,0,9,9)
>
> Ok, ANTLR will match WS in guess mode (backtracking) and then do it
> again with feeling.  It only emits at the outermost token rule
> invoked (in case INT invokes DIGIT) and only if you have not emitted
> a token yourself:
>
>              if ( token==null && ruleNestingLevel==1 ) {
>                  emit
> (type,line,charPosition,channel,start,getCharIndex()-1);
>              }
>
> So, somehow durin backtracking you are setting token (via emit()
> maybe).  Do you have an emit or token assignment inside an init action?
>
> This all works perfectly in my fuzzy java example.

It seems to be related to the ACTION tokens. Removing the tokens fixes the 
problem. WS was declared after the ACTION tokens, but moving it up didn't 
solve the problem. I've attached a minimised version of the first grammar 
that still shows WS tokens. The input is:

Printable {
  iface getString;
}

Using the Main I sent earlier, I get this result (WS = <7>):

[@0,0:8='Printable',<4>,1:0] channel = 0
[@1,9:9=' ',<7>,1:9] channel = 0
[@2,10:10='{',<8>,1:10] channel = 0
[@3,11:12='\n\t',<7>,1:11] channel = 0
[@4,13:17='iface',<5>,2:1] channel = 0
....
[specification]: line 1:9 mismatched token: [@1,9:9=' ',<7>,1:9]; expecting 
type '{'

Commenting out METHOD_SIG_ACTION and replacing it with IDENTIFIER correctly 
gives (WS = <6>):

[@0,0:8='Printable',<4>,1:0] channel = 0
[@2,10:10='{',<7>,1:10] channel = 0
[@4,13:17='iface',<5>,2:1] channel = 0
....

Changing METHOD_SIG_ACTION to no longer match certain whitespace characters 
causes the corresponding WS tokens to be moved to channel 99. For example 
changing METHOD_SIG_ACTION to (~(';'|' '))+ gives the following tokens (WS = 
<7>):

[@0,0:8='Printable',<4>,1:0] channel = 0
[@2,10:10='{',<8>,1:10] channel = 0
[@3,11:12='\n\t',<7>,1:11] channel = 0
[@4,13:17='iface',<5>,2:1] channel = 0
....

Note that token @1 is now at channel 99, but token @3 is still at channel 0.

As you can see in the grammar, I've only got actions that set the 'sigFollow' 
flag and one that sets channel=99. I don't emit tokens in the actions, or set 
token in any other way.

Best regard,
Emond
-------------- next part --------------
grammar TPL;

options {
	filter = true;
}

@lexer::members {
	private boolean sigFollow = false;
}

specification
	: IDENTIFIER '{' ifaceMethod* '}'
	;

ifaceMethod
	: IFACE METHOD_SIG_ACTION ';'
	;

IFACE: 'iface' {sigFollow = true;};

WS: ( ' ' | '\t' | '\f' | '\r' | '\n')+ { channel=99; } ;

METHOD_SIG_ACTION: {sigFollow}?=>
                   {sigFollow = false;}
                   (~(';'))+
                   ;

IDENTIFIER: ('a'..'z'|'A'..'Z'|'_'|'$') ('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')* ;


More information about the antlr-interest mailing list