[antlr-interest] ANTLR : lexer question

Fri Oct 11 14:30:37 PDT 2002

See the documentation about the lexer's handling of literals.

Monty

> -----Original Message-----
> From: dognogod [mailto:dognogod at yahoo.com]
> Sent: Friday, October 11, 2002 2:28 PM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] ANTLR : lexer question
> 
> 
> Hi,
> 
> I wrote the following lexer-- to explain my question.
> 
> ============================================
> class MyLexer extends Lexer;
> 
> options {
>     k = 5;
>     charVocabulary = '\3'..'\377';
> }
> 
> WS: ( ' '  | '\t' | ( '\r' | '\n' ) { newline (); })
>     { $setType(Token.SKIP); };
> 
> 
> TOK1: "CONF_"; 
> 
> ALPHANUM: ('a'..'z') | ('A'..'Z') | ('0'..'9');
> 
> ================================================
> 
> I use the input file containg:
> CONF_
> blabla
> CLASH
> CONF_1
> CONFIGURATION
> 
> 
> When I print all the tokens i get the following:
> ["CONF_",<5>,line=1,col=1]
> ["b",<6>,line=2,col=1]
> ["l",<6>,line=2,col=2]
> ["a",<6>,line=2,col=3]
> ["b",<6>,line=2,col=4]
> ["l",<6>,line=2,col=5]
> ["a",<6>,line=2,col=6]
> ["C",<6>,line=3,col=1]
> ["L",<6>,line=3,col=2]
> ["A",<6>,line=3,col=3]
> ["S",<6>,line=3,col=4]
> ["H",<6>,line=3,col=5]
> ["CONF_",<5>,line=4,col=1]
> ["1",<6>,line=4,col=6]
> Exception TokenStreamRecognitionException 5 : expecting '_', found 'I'
> 
> It seems ANTLR expects always the "CONF" characters to be follwed by
> the '_' char.
> 
> 
> I found a way to bypass this problem by adding the following rule:
> NTOK1: ('C' ~'O' | "CO" ~'N' | "CON" ~'F' | "CONF" ~'_');
> 
> 
> If I print again the tokens I don't get any exception errors:
> 
> ["CONF_",<5>,line=1,col=1]
> ["b",<7>,line=2,col=1]
> ["l",<7>,line=2,col=2]
> ["a",<7>,line=2,col=3]
> ["b",<7>,line=2,col=4]
> ["l",<7>,line=2,col=5]
> ["a",<7>,line=2,col=6]
> ["CL",<6>,line=3,col=1]
> ["A",<7>,line=3,col=3]
> ["S",<7>,line=3,col=4]
> ["H",<7>,line=3,col=5]
> ["CONF_",<5>,line=4,col=1]
> ["1",<7>,line=4,col=6]
> ["CONFI",<6>,line=5,col=1]
> ["G",<7>,line=5,col=6]
> ["U",<7>,line=5,col=7]
> ["R",<7>,line=5,col=8]
> ["A",<7>,line=5,col=9]
> ["T",<7>,line=5,col=10]
> ["I",<7>,line=5,col=11]
> ["O",<7>,line=5,col=12]
> ["N",<7>,line=5,col=13]
> ["null",<1>,line=6,col=1]
> 
> However:
> 
> 1. this oblige me to define a "non" token for every token I define
> 2. I don't get the alphanum token for the two following cases:
>   - "CLASH" is scanned as "CL + 'A' + ... (instead of 'C' 'L' 'A'...)
>   - "CONFIGURATION" is scanned as "CONFI" + 'G' +...
> 
> 
> My question is the following: How can I define tokens such as
> TOK1: "CONF_" and make sure they don't interfer with things like
> "CONFIGURATION", "CLASH",...
> 
> 
> Thanks,
> 
> 
> Can you (also) please directly reply to me (dognogod at yahoo.com)
> 
> S.
> 
> 
> 
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to 
http://docs.yahoo.com/info/terms/ 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/