[antlr-interest] ANTLR : lexer question

dognogod dognogod at yahoo.com
Fri Oct 11 14:27:34 PDT 2002


Hi,

I wrote the following lexer-- to explain my question.

============================================
class MyLexer extends Lexer;

options {
    k = 5;
    charVocabulary = '\3'..'\377';
}

WS: ( ' '  | '\t' | ( '\r' | '\n' ) { newline (); })
    { $setType(Token.SKIP); };


TOK1: "CONF_"; 

ALPHANUM: ('a'..'z') | ('A'..'Z') | ('0'..'9');

================================================

I use the input file containg:
CONF_
blabla
CLASH
CONF_1
CONFIGURATION


When I print all the tokens i get the following:
["CONF_",<5>,line=1,col=1]
["b",<6>,line=2,col=1]
["l",<6>,line=2,col=2]
["a",<6>,line=2,col=3]
["b",<6>,line=2,col=4]
["l",<6>,line=2,col=5]
["a",<6>,line=2,col=6]
["C",<6>,line=3,col=1]
["L",<6>,line=3,col=2]
["A",<6>,line=3,col=3]
["S",<6>,line=3,col=4]
["H",<6>,line=3,col=5]
["CONF_",<5>,line=4,col=1]
["1",<6>,line=4,col=6]
Exception TokenStreamRecognitionException 5 : expecting '_', found 'I'

It seems ANTLR expects always the "CONF" characters to be follwed by
the '_' char.


I found a way to bypass this problem by adding the following rule:
NTOK1: ('C' ~'O' | "CO" ~'N' | "CON" ~'F' | "CONF" ~'_');


If I print again the tokens I don't get any exception errors:

["CONF_",<5>,line=1,col=1]
["b",<7>,line=2,col=1]
["l",<7>,line=2,col=2]
["a",<7>,line=2,col=3]
["b",<7>,line=2,col=4]
["l",<7>,line=2,col=5]
["a",<7>,line=2,col=6]
["CL",<6>,line=3,col=1]
["A",<7>,line=3,col=3]
["S",<7>,line=3,col=4]
["H",<7>,line=3,col=5]
["CONF_",<5>,line=4,col=1]
["1",<7>,line=4,col=6]
["CONFI",<6>,line=5,col=1]
["G",<7>,line=5,col=6]
["U",<7>,line=5,col=7]
["R",<7>,line=5,col=8]
["A",<7>,line=5,col=9]
["T",<7>,line=5,col=10]
["I",<7>,line=5,col=11]
["O",<7>,line=5,col=12]
["N",<7>,line=5,col=13]
["null",<1>,line=6,col=1]

However:

1. this oblige me to define a "non" token for every token I define
2. I don't get the alphanum token for the two following cases:
  - "CLASH" is scanned as "CL + 'A' + ... (instead of 'C' 'L' 'A'...)
  - "CONFIGURATION" is scanned as "CONFI" + 'G' +...


My question is the following: How can I define tokens such as
TOK1: "CONF_" and make sure they don't interfer with things like
"CONFIGURATION", "CLASH",...


Thanks,


Can you (also) please directly reply to me (dognogod at yahoo.com)

S.





 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list