[antlr-interest] Loosing characters when choosing a less strong alternative

Louis Onrust louisonrust at gmail.com
Wed Feb 25 00:55:51 PST 2009


Hi,

I'm busy with some sort of code generator, and for some reason I wanted
it to recognise both "unsigned long long" and "unsigned long" too. But
that's where it went wrong.

This is a heavily edited snippet of my grammar, but it still contains
the "error":

===
grammar test1;

options { backtrack = true; }

program        : class_statement+ program?      ;
   
class_statement: 'public' attribute_type ID ';' ;
   
attribute_ID   : ID                             ;

attribute_type
    : 'unsigned long long'
    | 'unsigned long'
    ;

ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*  ;
WS : ( ' ' | '\t' | '\r' | '\n')+ { $channel = HIDDEN; } ;
===

The point is now that this input gets recognised:
public unsigned long long objectId;
public unsigned long long userId;

But this input is not:
public unsigned long long objectId;
public unsigned long userId;
The debugger sees this as input then:
public unsigned long long objectId;
public serId;

I thought because of putting the stronger/bigger choice above, it should
recognise the unsigned long long anyway. But I don't understand why it
doesn't recognise unsigned long, because the first option doesn't fit,
so I'd think he'd search through the other options to see what fits.

Anyone can shed some light on this matter?

louis


More information about the antlr-interest mailing list