[antlr-interest] modifying token creation

Heiko Folkerts Heiko.Folkerts at david-bs.de
Thu Sep 24 22:56:16 PDT 2009


Hi Indhu and David,
OK, maybe indhu is right and modifying the token creation is the wrong way to solve my error handling problems. When I try to retrieve user friendly error messages from wrong input the recognizer state gives me nothing to help me. The backtracking tries all paths and returns a no viable path exception. E.g. this is an excerpt from the grammar:
statement: 
    actionexpression
    | statecheck;
statecheck: stateobject (stateoption  | ) compoperator selectedstate  -> ^(STATECHECK stateobject ^(COMPERATOR compoperator) selectedstate stateoption) |
    stateobject (stateoption  | ) compoperator paramname-> ^(STATECHECK stateobject ^(COMPERATOR compoperator) ^(PARAMREF paramname) stateoption);
actionexpression: actionobject (actionoption | ) action -> ^(ACTIONEXPRESSION actionobject action actionoption);

The stateobject rules etc. use syntactic predicates to check wether the input is a correct keyword - thant engine is an legal object.

Now if I enter the input "engine;" where engine is an object (stateobject and actionobject have the same meaning) I need to tell the user that we expect either an actionexpression or an statecheck instead of the ';'. 

I am coding in C so until ANTLR 3.2 the catch block in the grammar was not supported. 

So how would i deal best in such situations?

Thx
Heiko

-----Ursprüngliche Nachricht-----
Von: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] Im Auftrag von David-Sarah Hopwood
Gesendet: Donnerstag, 24. September 2009 09:10
An: antlr-interest at antlr.org
Betreff: Re: [antlr-interest] Howto modify token creation?

Indhu Bharathi wrote:
> You can do something like
> 
> ID	: LETTER (LETTER|DIGIT)*
> 	{
> 		String text = getText();
> 		Integer tknType;
> 		if( (tknType=table.get(text))!=null ) {
> 			$type = tknType;
> 		}
> 	}
> 
> The table can be passed to the lexer using some member function. But I 
> don't know any clean way how to make sure ANTLR lexer doesn't assign 
> the same int to some other token.

YourParser.tokenNames.length should be the first unused token number.

(Obviously, this is relying on an implementation detail, but probably a fairly stable one.)

Note that if you use token numbers >= tokenNames.length, you should override getErrorMessage in the parser, so that it doesn't throw an ArrayIndexOutOfBoundsException when constructing an error message involving one of these tokens. For example:

@parser::members {
  public String getErrorMessage(RecognitionException e, String[] names) {
    String[] dynamicNames = /* array also containing dynamic token names */;
    return super.getErrorMessage(e, dynamicNames);
  }
}

--
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list