[antlr-interest] Keeping lookahead low

Alexey Demakov demakov at ispras.ru
Thu Aug 25 02:27:05 PDT 2005


Two days ago I've sent to you corrected grammar, but I don't see my changes in
grammar attached to your message. I've removed THING_ID rule
and changed ID rule as follows:

ID
options {
 testLiterals = true;
}
 : ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9')*
  {
   String t = $getText;
   if( t.startsWith( "Thing" ) && t.length() > "Thing".length() )
   {
    boolean isNumber = true;
    for( int i = 5; i < t.length(); i++ )
    {
     char c = t.charAt(i);
     if( c < '0' || '9' < c ) isNumber = false;
    }
    if( isNumber ) $setType( THING_ID );
   }
  }
 ;

All your tests are GOOD.

When parser sees "THING_ID = ID" it decides that loop is finished
and tries to match token after loop. It is '}'.
I agree, message "expecting ID or '}'" is better, but ANTLR can't do that.

Regards,
Alexey

-----
Alexey Demakov
TreeDL: Tree Description Language: http://treedl.sourceforge.net
RedVerst Group: http://www.unitesk.com


----- Original Message ----- 
From: "Ciaran Treanor" <ciaran.treanor at gmail.com>
To: <antlr-interest at antlr.org>
Sent: Wednesday, August 24, 2005 7:46 PM
Subject: [antlr-interest] Keeping lookahead low


Following on from the help provided by Alexey and Olivier yesterday
I've cleaned up a test grammar I was working on and am left with one
question outstanding.

I have a test data file that looks like the following:
System {
  foo = Th       ! case 1:  BROKEN - rhs should be considered an ID
  foo = Thing    ! case 2: BROKEN - rhs should be considered an ID
  foo = Thing123 ! case 3: GOOD - rhs is a THING_ID
  foo = Thingy   ! case 4: GOOD - rhs is a regular id
  foo = foo      ! case 5: GOOD - rhs is a regular id
  Th = foo       ! case 6: BROKEN - lhs should be considered an ID
  Thing = foo    ! case 7: BROKEN - lhs should be considered an ID
  Thing123 = foo ! case 8: Why is error "expecting '}'" instead of expecting ID
  Thingy = foo   ! case 9: GOOD - lhs is a regular id
}

Can anyone tell me why the parser fails with the following error when
it endounters 'Th' or 'Thing'? Increasing lookahead to 6 fixes case 1
and case 2. Unfortunately increasing the lookahead isn't really an
option for me since, in reality, 'Thingy' is actually a 20 character
word.

What's the simplest thing I can do to the grammar to fix the cases
above that I've flagged as broken?

Oh, can anyone explain the error reported for case 8. This case is an
assignment that looks like:
THING_ID = ID

Since the grammar is expecting assignments of the form:
ID = ( ID | THING_ID)

I would have thought the parser would complain that it found a
THING_ID when it was expecting a regular ID. Instead it compains about
expecting '}'. Why is that?

Thanks a million (oh, grammar and test file attached)
ct




More information about the antlr-interest mailing list