[antlr-interest] fragment: simple (or naive) usage does not work

Terence Parr parrt at cs.usfca.edu
Wed Mar 14 10:46:57 PDT 2007


Hi.  Works on my latest version.  I copied your lexer into T.g and  
made a test file:

ANTLRInputStream input = new ANTLRInputStream(System.in);

// create a lexer that feeds off of input CharStream
TLexer lexer = new TLexer(input);

// create a buffer of tokens pulled from the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
List a = tokens.getTokens();
for (int i=0; i<a.size(); i++) {
         System.out.println(a.get(i));
}

Lexer:

lexer grammar T;
INT          : 'int' ;
SEMI         : ';' ;
WS           :  (  ' '| '\t'| '\r' | '\n' )+ {$channel=HIDDEN;} ;

IDENTIFIER   :
    ('a'..'z'|'A'..'Z'|'_')+ ;

NUMBER : DIGIT+ (BASE ZNUM+)? ;
fragment ZNUM : DIGIT|'z'|'Z' ;
fragment BASE : 'b' | 'h';
fragment DIGIT : '0'..'9';

Got this output:

/tmp $ java Test
int id;
int int_id;
int _int_id;
45b32
6h87z
[@0,0:2='int',<4>,1:0]
[@1,3:3=' ',<6>,channel=99,1:3]
[@2,4:5='id',<7>,1:4]
[@3,6:6=';',<5>,1:6]
[@4,7:7='\n',<6>,channel=99,1:7]
[@5,8:10='int',<4>,2:0]
[@6,11:11=' ',<6>,channel=99,2:3]
[@7,12:17='int_id',<7>,2:4]
[@8,18:18=';',<5>,2:10]
[@9,19:19='\n',<6>,channel=99,2:11]
[@10,20:22='int',<4>,3:0]
[@11,23:23=' ',<6>,channel=99,3:3]
[@12,24:30='_int_id',<7>,3:4]
[@13,31:31=';',<5>,3:11]
[@14,32:32='\n',<6>,channel=99,3:12]
[@15,33:37='45b32',<11>,4:0]
[@16,38:38='\n',<6>,channel=99,4:5]
[@17,39:43='6h87z',<11>,5:0]
[@18,44:44='\n',<6>,channel=99,5:5]

Let me know if you want my TLexer.java file so we can compare.

Ter

On Feb 27, 2007, at 11:26 AM, Martin d'Anjou wrote:

> Hi,
>
> This is my input:
> int id;
> int int_id;
> int _int_id;
> 45b32
> 6h87z
>
> I have to parse those pesky numbers at the botom. So I wrote the  
> following lexer:
>
> lexer grammar DUMMY_Lexer;
> INT          : 'int' ;
> SEMI         : ';' ;
> WS           :  (  ' '| '\t'| '\r' | '\n' )+ {$channel=HIDDEN;} ;
>
> IDENTIFIER   :
>    ('a'..'z'|'A'..'Z'|'_')+ ;
>
> NUMBER : DIGIT+ (BASE ZNUM+)? ;
> fragment ZNUM : DIGIT|'z'|'Z' ;
> fragment BASE : 'b' | 'h';
> fragment DIGIT : '0'..'9';
>
> And of course the parser:
>
> parser grammar DUMMY_Parser;
> options {
>   tokenVocab=DUMMY_Lexer;
> }
>
> source_text :
>   { System.out.println("Weird lexer"); }
>   int_defs+
>   numbers+
>   ;
>
> int_defs :
>   INT            { System.out.print("int "); }
>   id=IDENTIFIER  { System.out.print($id.text); }
>   SEMI           { System.out.println(";"); }
>   ;
>
> numbers :
>   n=NUMBER         { System.out.println($n.text); }
>   ;
>
>
> Alas, I get:
> line 4:0 required (...)+ loop did not match anything at input '45b32'
>
> If I move ZNUM inside NUMBER, like this:
>
> NUMBER : DIGIT+ (BASE (DIGIT|'z'|'Z')+)? ;
>
> then it works. What's up with fragment lexer rules?
>
> Thanks,
> Martin



More information about the antlr-interest mailing list