[antlr-interest] Lexer token type problem

Mon Nov 25 08:14:30 PST 2002

The lexer always assigns the token type as the type of the rule.  That way
Numbers don't come out as DIGITs, etc.  Simply $setType() as an action after
PARAGRAPH and CRNL.  It's a little trickier if you need to test for literals
too.  An example of that is in the ID rule of the GCC parser.
http://www.codetransform.com/gcc.html.

Monty

-----Original Message-----
From: Matthew Ford [mailto:Matthew.Ford at forward.com.au]
Sent: Friday, November 22, 2002 4:23 PM
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] Lexer token type problem

Hi all,
I have a lexer with the following rules (and others)

CRNL_PARAGRAPH
 : ('\r' '\n' 'T')=>PARAGRAPH
 | ('\r' '\n' ~('T')) => CRNL ;

protected 
CRNL 
 : '\r' '\n'  
  ;

protected 
PARAGRAPH
 : "\r\nT" 
 ;

I expected to get token types  PARAGRAPH and CRNL returned but only got
CRNL_PARAGRAPH even though the rules PARAGRAPH and CRNL were called.

changing  CRNL_PARAGRAPH to

CRNL_PARAGRAPH
 : ('\r' '\n' 'T')=>PARAGRAPH {$setType(PARAGRAPH);}
 | ('\r' '\n' ~('T')) => CRNL {$setType(CRNL);}
 ;

fixed the problem but I am still not clear why the original version is not
valid.

Any comments?

matthew

Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/