[antlr-interest] ANTLR : lexer question
mzukowski at yci.com
mzukowski at yci.com
Fri Oct 11 14:30:37 PDT 2002
See the documentation about the lexer's handling of literals.
Monty
> -----Original Message-----
> From: dognogod [mailto:dognogod at yahoo.com]
> Sent: Friday, October 11, 2002 2:28 PM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] ANTLR : lexer question
>
>
> Hi,
>
> I wrote the following lexer-- to explain my question.
>
> ============================================
> class MyLexer extends Lexer;
>
> options {
> k = 5;
> charVocabulary = '\3'..'\377';
> }
>
> WS: ( ' ' | '\t' | ( '\r' | '\n' ) { newline (); })
> { $setType(Token.SKIP); };
>
>
> TOK1: "CONF_";
>
> ALPHANUM: ('a'..'z') | ('A'..'Z') | ('0'..'9');
>
> ================================================
>
> I use the input file containg:
> CONF_
> blabla
> CLASH
> CONF_1
> CONFIGURATION
>
>
> When I print all the tokens i get the following:
> ["CONF_",<5>,line=1,col=1]
> ["b",<6>,line=2,col=1]
> ["l",<6>,line=2,col=2]
> ["a",<6>,line=2,col=3]
> ["b",<6>,line=2,col=4]
> ["l",<6>,line=2,col=5]
> ["a",<6>,line=2,col=6]
> ["C",<6>,line=3,col=1]
> ["L",<6>,line=3,col=2]
> ["A",<6>,line=3,col=3]
> ["S",<6>,line=3,col=4]
> ["H",<6>,line=3,col=5]
> ["CONF_",<5>,line=4,col=1]
> ["1",<6>,line=4,col=6]
> Exception TokenStreamRecognitionException 5 : expecting '_', found 'I'
>
> It seems ANTLR expects always the "CONF" characters to be follwed by
> the '_' char.
>
>
> I found a way to bypass this problem by adding the following rule:
> NTOK1: ('C' ~'O' | "CO" ~'N' | "CON" ~'F' | "CONF" ~'_');
>
>
> If I print again the tokens I don't get any exception errors:
>
> ["CONF_",<5>,line=1,col=1]
> ["b",<7>,line=2,col=1]
> ["l",<7>,line=2,col=2]
> ["a",<7>,line=2,col=3]
> ["b",<7>,line=2,col=4]
> ["l",<7>,line=2,col=5]
> ["a",<7>,line=2,col=6]
> ["CL",<6>,line=3,col=1]
> ["A",<7>,line=3,col=3]
> ["S",<7>,line=3,col=4]
> ["H",<7>,line=3,col=5]
> ["CONF_",<5>,line=4,col=1]
> ["1",<7>,line=4,col=6]
> ["CONFI",<6>,line=5,col=1]
> ["G",<7>,line=5,col=6]
> ["U",<7>,line=5,col=7]
> ["R",<7>,line=5,col=8]
> ["A",<7>,line=5,col=9]
> ["T",<7>,line=5,col=10]
> ["I",<7>,line=5,col=11]
> ["O",<7>,line=5,col=12]
> ["N",<7>,line=5,col=13]
> ["null",<1>,line=6,col=1]
>
> However:
>
> 1. this oblige me to define a "non" token for every token I define
> 2. I don't get the alphanum token for the two following cases:
> - "CLASH" is scanned as "CL + 'A' + ... (instead of 'C' 'L' 'A'...)
> - "CONFIGURATION" is scanned as "CONFI" + 'G' +...
>
>
> My question is the following: How can I define tokens such as
> TOK1: "CONF_" and make sure they don't interfer with things like
> "CONFIGURATION", "CLASH",...
>
>
> Thanks,
>
>
> Can you (also) please directly reply to me (dognogod at yahoo.com)
>
> S.
>
>
>
>
>
>
>
> Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list