[antlr-interest] Failing on case test: how?

Sam Harwell sharwell at pixelminegames.com
Wed Nov 12 09:43:55 PST 2008


Can you make a rule like:

variable returns [Variable result]
  : IDENTIFIER { $result = getVariable($IDENTIFIER.text); }
  ;

Where the check is done in the indicated function? If there are cases where only one or the other is allowed, you can also do that semantic analysis at another point. The advantage here is you can have a more "lenient" parser that's able to provide better error messages once you start working on that.

The other option with *gated* semantic predicates is (I don't do Java so the code is a best guess on what it might be in Java):

firstOrderVariable returns [FirstOrderVariable result]
	: {Character.isLowerCase( input.LA(1).getText().charAt(0) )} =>
        IDENTIFIER { $result = getFirstOrderVariable($IDENTIFIER.text); }
	;

secondOrderVariable returns [SecondOrderVariable result]
	: {Character.isUpperCase( input.LA(1).getText().charAt(0) )} =>
        IDENTIFIER { $result = getSecondOrderVariable($IDENTIFIER.text); }
	;

Sam

-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Hendrik Maryns
Sent: Wednesday, November 12, 2008 11:04 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Failing on case test: how?

Hi,

I have two rules as follows:

firstOrderVariable returns [FirstOrderVariable result]
	: IDENTIFIER { $result = getFirstOrderVariable($IDENTIFIER.text); }
	;

secondOrderVariable returns [SecondOrderVariable result]
	: IDENTIFIER { $result = getSecondOrderVariable($IDENTIFIER.text); }
	;

with

IDENTIFIER
  : ( ~( CLOSE | OPEN | WS ) )+
  ;

OPEN : '(' ;
CLOSE : ')' ;
fragment WS : ( ' ' | '\t' ) ;

Most of the times, it is no problem they are identical, since the rest of the surrounding rule will do the disambiguation.  Unfortunately, in some cases, it won’t.  I would like to separate them in that first-order variables should start with a lowercase letter, but second-order variables with an uppercase letter.  So I’d like to insert some semantic predicate which checks the case of the first letter and fails as appropriate.  Is this possible, and if yes, how?

If it isn’t I can make sure the surrounding rules do the disambiguation, but this solution would be nicer.

Originally, I had a lexer rule FIRST_ORDER_VARIABLE: LOWER ( LOWER | UPPER )* and similar for SECOND_ORDER_VARIABLE, but that didn’t work properly since that way it was not possible to match other kind of stuff with the IDENTIFIER rule.  (It would have been lexed as FOV, but an ID was expected, so the lexer would barf.  It might be that I interpret the problem incorrectly, but at least now it works.)  Also just allowing letters is too simple, I want to allow (almost) every possible Unicode token.

TIA, H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html



More information about the antlr-interest mailing list