[antlr-interest] Lexer Character Set Question (ANTLR3.0b6)

Curtis Clauson NOSPAM at TheSnakePitDev.com
Wed Feb 21 16:56:18 PST 2007


Your problem is indeed lexer related. The Antlr lexer is separate from 
the parser and is not capable of taking parser context into account. 
There is no way for the lexer to know that the single letter after the 
escape start character '$' is a different token from a sequence of 
letters. This means you must include the escape flag character as part 
of the escape start token.

The following is one way to accomplish this. Tested on antlr-3.0b6:

/*
  * Parser rules
  */
content
     :   (   escape
             {
                 System.out.println(
                       "Escape " + $escape.flag + ": \""
                     + $escape.id + "\""
                 );
             }
         |   Letters
             {
                 System.out.println(
                     "String  : \"" + $Letters.text + "\""
                 );
             }
         )*
     ;

escape
returns [char flag, String id]
     :   EscapeStart Letters EscapeEnd
         {
             $flag = $EscapeStart.text.charAt(0);
             $id   = $Letters.text;
         }
     ;


/*
  * Lexer rules
  */
Letters	    :   Letter+;
EscapeStart :	'$' Letter {setText($Letter.text);};
EscapeEnd   :	')';

fragment
Letter	:	'a'..'z' | 'A'..'Z';



Bert Williams wrote:
> Given the following rules abstracted from a larger grammar:
> 
> using_rule_1: ‘$’ flag id ')' ;
> flag   : ( 'U' ) ;
> id: (ALPHA)+ ;
> using_rule_2 : (ALPHA)+ ;
> 
> ALPHA : ('a'..'z'|'A'..'Z' ) ;
> 
> 
> I need to understand how to indicate in “using_rule” that other rules 
> that ALPHA is intended.  I believe that I should use a Semantic 
> Predicate, but can find no way to indicate something like:
> 
> “ if I am in “using_rule_1”, accept as “flag” those characters indicated 
> in rule “flag”. “
> 
> i.e.,
> The lexer appears to generate different tokes for ‘U’ and for ALPHA, 
> which means that in “using_rule_2”, I get no matches for ‘U’ (one of the 
> flags).  This is true even though ‘U’ is a member of the ALPHA set.
> 
> Is this the right direction here or should I consider making “flag” a 
> lexer rule and modifying ALPHA accordlingly?
> 
> Thanks!
> Bert Williams.

-- 
"Any sufficiently over-complicated magic is indistinguishable from 
technology." -- Llelan D.



More information about the antlr-interest mailing list