[antlr-interest] lexer rule attributes

Terence Parr parrt at cs.usfca.edu
Wed Nov 1 14:59:41 PST 2006


Hi,
Lexer rules always have an implicit return value of type Token that  
is sent back to the parser, however, lexer rules that refer to other  
lexer rules may access those portions of the overall token matched by  
the other rules and returned as implicit tokens. The following rule  
illustrates a composite lexer rule that reuses another token definition.

PREPROC_CMD
         :       '#' ID {System.out.println("cmd="+$ID.text);}
         ;
ID      :       ('a'..'z'|'A'..'Z')+
         ;
Lexer (non-fragment) rules may also contain actions that access  
attributes of the surrounding rule itself. Code generated for rules  
begins with a preamble that sets the predefined attributes:

ruleNestingLevel++;
int type = <standin>ruleTokenType</standin>;
int start = getCharIndex();
int line = getLine();
int charPosition = getCharPositionInLine();
int channel = Token.DEFAULT_CHANNEL;

BUT, do we want to say $text, $line, etc... for consistency?  It  
means adding a bunch more templates to handle these predefined  
attributes.  $line is translated to line etc...  $text however needs  
to be getText().  Hmm...should lexer rules be treated differently?

Ter


More information about the antlr-interest mailing list