[antlr-interest] fragment: simple (or naive) usage does not work

Jim Idle jimi at intersystems.com
Tue Feb 27 11:33:55 PST 2007


Martin,

I have noticed something similar to this and am trying to distill a
decent example for Ter (maybe this is it). I think that the set handling
is failing when a fragment rule is used like this (I noticed it with
(~(FRAGMENT))* for instance). Hence it is trying to parse something but
it ain't what it should be. The workaround, as you say, is to avoid the
fragment rule in this case.

You might be able to see whether this is doing what you want by putting
this example in ANTLR works and clicking on the lexer rule. It will show
the character sequences it is going to look after and I think you will
see that it gives a different sequence when you use a fragment and that
is what is going wrong.

Jim

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Martin d'Anjou
Sent: Tuesday, February 27, 2007 11:27 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] fragment: simple (or naive) usage does not
work

Hi,

This is my input:
int id;
int int_id;
int _int_id;
45b32
6h87z

I have to parse those pesky numbers at the botom. So I wrote the
following 
lexer:

lexer grammar DUMMY_Lexer;
INT          : 'int' ;
SEMI         : ';' ;
WS           :  (  ' '| '\t'| '\r' | '\n' )+ {$channel=HIDDEN;} ;

IDENTIFIER   :
    ('a'..'z'|'A'..'Z'|'_')+ ;

NUMBER : DIGIT+ (BASE ZNUM+)? ;
fragment ZNUM : DIGIT|'z'|'Z' ;
fragment BASE : 'b' | 'h';
fragment DIGIT : '0'..'9';

And of course the parser:

parser grammar DUMMY_Parser;
options {
   tokenVocab=DUMMY_Lexer;
}

source_text :
   { System.out.println("Weird lexer"); }
   int_defs+
   numbers+
   ;

int_defs :
   INT            { System.out.print("int "); }
   id=IDENTIFIER  { System.out.print($id.text); }
   SEMI           { System.out.println(";"); }
   ;

numbers :
   n=NUMBER         { System.out.println($n.text); }
   ;


Alas, I get:
line 4:0 required (...)+ loop did not match anything at input '45b32'

If I move ZNUM inside NUMBER, like this:

NUMBER : DIGIT+ (BASE (DIGIT|'z'|'Z')+)? ;

then it works. What's up with fragment lexer rules?

Thanks,
Martin


More information about the antlr-interest mailing list