[antlr-interest] v3 lexer cannot tell keyword from identifier (very strange)
Martin d'Anjou
martin.danjou at neterion.com
Thu Feb 22 15:14:16 PST 2007
On Thu, 22 Feb 2007, Miguel Ping wrote:
> Doesn't it has to do with precedence? My (maybe stupid) guess is that
> antlr is trying to match int before that trying to match int_id..
I tried putting the IDENTIFIER token definition first, and when I do that,
I get:
line 1:0 required (...)+ loop did not match anything at input 'int'
So I don't know what's going on. It's like the tokenizer is non-greedy for
some reason.
As I said, it is very strange.
Martin
> On 2/22/07, Martin d'Anjou <martin.danjou at neterion.com> wrote:
>> Hi,
>>
>> I have a very strange problem in 3.0b6. Given the input text:
>>
>> int id;
>> int int_id;
>>
>> The error:
>>
>> line 2:4 mismatched input 'int' expecting IDENTIFIER
>>
>> It is mistaking "int_id" for "int", treating the underscore as a token
>> separator. The (ridiculous looking) lexer is:
>>
>> lexer grammar DUMMY_Lexer;
>> options { filter=true; }
>>
>> MOD : 'mod' ;
>> END : 'end' ;
>> DEF : 'def' ;
>> INC : 'inc' ;
>> PAR : 'par' ;
>> INP : 'inp' ;
>> OUT : 'out' ;
>> INO : 'ino' ;
>> INT : 'int' ;
>> WER : 'wer' ;
>> COMMA : ',' ;
>> SEMI : ';' ;
>> L_PAREN : '(' ;
>> R_PAREN : ')' ;
>> ASSIGN : '=' ;
>> SHARP : '#' ;
>> LSHIFT : '<<' ;
>> MULT : '*' ;
>> MINUS : '-' ;
>> PLUS : '+' ;
>> COLON : ':' ;
>> LTEQ : '<=' ;
>> L_CURLY : '{' ;
>> R_CURLY : '}' ;
>> OR : '|' ;
>> SQUARE : '[]' ;
>> QUOTE : '"' ;
>> DIGIT : '0' ;
>> WS : ( ' ' | EOL )+ {$channel=HIDDEN;} ;
>> EOL : ('\r\n'|'\r'|'\n') ;
>> LetterC : 'c' | Nothing ;
>> Nothing : 't' ;
>> SL_COMMENT :'a';
>> ML_COMMENT : '/' ;
>> BASE : 'b' ;
>> BASE_NUM : DIGIT+ (BASE DIGIT+)? ;
>>
>> IDENTIFIER : ('a'..'z'|UNDERSCORE)+ ;
>>
>> fragment
>> UNDERSCORE : '_' ;
>>
>> The only token I was able to get out was the QUESTION : '?'; token. When I
>> remove any other token (like MOD or other), the error changes to:
>>
>> line 1:0 required (...)+ loop did not match anything at input 'int'
>>
>> Which makes it even weirder...
>>
>> Now the parser is fairly minimal:
>>
>> parser grammar DUMMY_Parser;
>> options {
>> tokenVocab=DUMMY_Lexer;
>> }
>>
>> source_text :
>> int_defs+
>> ;
>>
>> int_defs :
>> INT { System.out.print("int "); }
>> id=IDENTIFIER { System.out.print($id.text); }
>> SEMI { System.out.println(";"); }
>> ;
>>
>> Help!!! (and thanks!)
>> Martin
>>
>
More information about the antlr-interest
mailing list