[antlr-interest] Loosing characters when choosing a less strong alternative

Indhu Bharathi indhu.b at s7software.com
Wed Feb 25 01:22:11 PST 2009


attribute_type
    : 'unsigned long long'
    | 'unsigned long'
    ;

Though you have written 'unsigned long long' and 'unsigned long' in the parser rule, they will be still considered as lexer rule only. 

The lexer after seeing 'unsigned long' will try to go for the bigger match ('unsigned long long') as it sees a space following 'unsigned long'. But after consuming the space it will find out that there is a 'mismatch' and report an error. 

A more correct and elegant way of doing what you need is:


attribute_type
    : UNSIGNED LONG LONG
    | UNSIGNED LONG
    ;

UNSIGNED	:	'unsigned'
	;
	
LONG	:	'long'
	;

Hope that helps. You can take a look at 'Problem when parsing numerics' thread to understand more on why this happens. 

- Indhu


----- Original Message -----
From: Louis Onrust <louisonrust at gmail.com>
To: antlr-interest at antlr.org
Sent: Wednesday, February 25, 2009 2:25:51 PM GMT+0530 Asia/Calcutta
Subject: [antlr-interest] Loosing characters when choosing a less strong alternative

Hi,

I'm busy with some sort of code generator, and for some reason I wanted
it to recognise both "unsigned long long" and "unsigned long" too. But
that's where it went wrong.

This is a heavily edited snippet of my grammar, but it still contains
the "error":

===
grammar test1;

options { backtrack = true; }

program        : class_statement+ program?      ;
   
class_statement: 'public' attribute_type ID ';' ;
   
attribute_ID   : ID                             ;

attribute_type
    : 'unsigned long long'
    | 'unsigned long'
    ;

ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*  ;
WS : ( ' ' | '\t' | '\r' | '\n')+ { $channel = HIDDEN; } ;
===

The point is now that this input gets recognised:
public unsigned long long objectId;
public unsigned long long userId;

But this input is not:
public unsigned long long objectId;
public unsigned long userId;
The debugger sees this as input then:
public unsigned long long objectId;
public serId;

I thought because of putting the stronger/bigger choice above, it should
recognise the unsigned long long anyway. But I don't understand why it
doesn't recognise unsigned long, because the first option doesn't fit,
so I'd think he'd search through the other options to see what fits.

Anyone can shed some light on this matter?

louis

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list