[antlr-interest] ANTLR : lexer question

dognogod dognogod at yahoo.com
Fri Oct 11 14:27:34 PDT 2002


I wrote the following lexer-- to explain my question.

class MyLexer extends Lexer;

options {
    k = 5;
    charVocabulary = '\3'..'\377';

WS: ( ' '  | '\t' | ( '\r' | '\n' ) { newline (); })
    { $setType(Token.SKIP); };

TOK1: "CONF_"; 

ALPHANUM: ('a'..'z') | ('A'..'Z') | ('0'..'9');


I use the input file containg:

When I print all the tokens i get the following:
Exception TokenStreamRecognitionException 5 : expecting '_', found 'I'

It seems ANTLR expects always the "CONF" characters to be follwed by
the '_' char.

I found a way to bypass this problem by adding the following rule:
NTOK1: ('C' ~'O' | "CO" ~'N' | "CON" ~'F' | "CONF" ~'_');

If I print again the tokens I don't get any exception errors:



1. this oblige me to define a "non" token for every token I define
2. I don't get the alphanum token for the two following cases:
  - "CLASH" is scanned as "CL + 'A' + ... (instead of 'C' 'L' 'A'...)
  - "CONFIGURATION" is scanned as "CONFI" + 'G' +...

My question is the following: How can I define tokens such as
TOK1: "CONF_" and make sure they don't interfer with things like


Can you (also) please directly reply to me (dognogod at yahoo.com)



Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 

More information about the antlr-interest mailing list