[antlr-interest] a simple (not for me :)) grammar problem

Fırat Küçük firatkucuk at gmail.com
Mon Jan 7 00:20:41 PST 2008


this is my simple solution:

the original sample grammar:
grammar Sample;

start           :  (FLOAT | INTEGER) DOT IDENTIFIER;

FLOAT           :  NUMBER DOT NUMBER;
INTEGER         :  NUMBER;
IDENTIFIER      :  LETTER+;
DOT             :  '.';
WHITESPACE      :  (' ' | '\t')+ {$channel = HIDDEN;};
fragment NUMBER :  DIGIT+;
fragment LETTER :  'a' .. 'z';
fragment DIGIT  :  '0' .. '9';

i can convert float lexer rule and integer lexer rule to parser rules
so i can use syntactic predication.


grammar Sample;

start
options {backtrack = true;}
	:  (floatLiteral | integerLiteral) DOT IDENTIFIER
	;

floatLiteral  	:	NUMBER DOT NUMBER;
integerLiteral  : NUMBER;
IDENTIFIER      :  LETTER+;
DOT             :  '.';
WHITESPACE      :  (' ' | '\t')+ {$channel = HIDDEN;};
NUMBER          :  DIGIT+;
fragment LETTER :  'a' .. 'z';
fragment DIGIT  :  '0' .. '9';


it parses:
3.hello
and
3.4.hello

but the new problem is :

it parses:  3   .   4   .  hello

float literals should be adjacent as in Java grammar.


2008/1/7, Gavin Lambert <antlr at mirality.co.nz>:
> At 16:26 7/01/2008, Mark Volkmann wrote:
>  >It should be easy right. Terr already gave the hint that the
>  >problem is that it was greedily grabbing the DOT for FLOAT
>  >instead of leaving it for the separator between the number
>  >and the identifier. Piece of cake? Well I've tried several
>  >things I thought would work to no avail.
>  >Why in the world doesn't this work?
> [...]
>  > backtrack = true; // I shouldn't need this, but I don't think
> it
>  >can hurt.
>
> It's not going to help, either.  "backtrack = true" has no effect
> on the lexer.
>
>  >FLOAT: NUMBER DOT NUMBER;
>  >INTEGER: NUMBER;
>  >IDENTIFIER: LETTER+;
>  >DOT: '.';
>  >fragment NUMBER: DIGIT+;
>  >fragment LETTER: 'a' .. 'z';
>  >fragment DIGIT: '0' .. '9';
>
> This has been discussed to death before.  For reasons of
> performance (and some other obscure thing, I think), when
> processing a + loop ANTLR will use k=1 lookahead.  Thus when faced
> with the choice between FLOAT and INTEGER, it looks ahead to see
> at least one DIGIT and then says "ok, that's a FLOAT".  It doesn't
> look past all the DIGITs to see whether there's a DOT or
> not.  (Ter has said he might look into improving this a bit in a
> later version.)
>
> Whenever there's a common prefix in your tokens, you will need to
> combine the rules to remove the ambiguity:
>
> INTEGER
>    : NUMBER
>      ( /* nothing afterwards */
>      | DOT NUMBER { $type = FLOAT; }
>      )
>    ;
>
>


-- 
Öğr. Gör. Fırat Küçük
ADAMYO Distance Learning
SAKARYA University / TURKEY


More information about the antlr-interest mailing list