[antlr-interest] fragment: simple (or naive) usage does not work

Martin d'Anjou martin.danjou at neterion.com
Wed Feb 28 06:16:45 PST 2007


> This is my input:
> int id;
> int int_id;
> int _int_id;
> 45b32
> 6h87z
>
> I have to parse those pesky numbers at the botom. So I wrote the following 
> lexer:
>
> lexer grammar DUMMY_Lexer;
> INT          : 'int' ;
> SEMI         : ';' ;
> WS           :  (  ' '| '\t'| '\r' | '\n' )+ {$channel=HIDDEN;} ;
>
> IDENTIFIER   :
>   ('a'..'z'|'A'..'Z'|'_')+ ;
>
> NUMBER : DIGIT+ (BASE ZNUM+)? ;
> fragment ZNUM : DIGIT|'z'|'Z' ;
> fragment BASE : 'b' | 'h';
> fragment DIGIT : '0'..'9';
>
> And of course the parser:
>
> parser grammar DUMMY_Parser;
> options {
>  tokenVocab=DUMMY_Lexer;
> }
>
> source_text :
>  { System.out.println("Weird lexer"); }
>  int_defs+
>  numbers+
>  ;
>
> int_defs :
>  INT            { System.out.print("int "); }
>  id=IDENTIFIER  { System.out.print($id.text); }
>  SEMI           { System.out.println(";"); }
>  ;
>
> numbers :
>  n=NUMBER         { System.out.println($n.text); }
>  ;
>
>
> Alas, I get:
> line 4:0 required (...)+ loop did not match anything at input '45b32'
>
> If I move ZNUM inside NUMBER, like this:
>
> NUMBER : DIGIT+ (BASE (DIGIT|'z'|'Z')+)? ;
>
> then it works. What's up with fragment lexer rules?

I found the problem. Using tokenVocab in the parser means the parser needs 
to be rebuilt when the Lexer changes, which is something I was missing in 
my Makefile. I'm just not used to work with the Java flow I guess.

Martin


More information about the antlr-interest mailing list