[antlr-interest] a simple (not for me :)) grammar problem
Fırat Küçük
firatkucuk at gmail.com
Mon Jan 7 00:20:41 PST 2008
this is my simple solution:
the original sample grammar:
grammar Sample;
start : (FLOAT | INTEGER) DOT IDENTIFIER;
FLOAT : NUMBER DOT NUMBER;
INTEGER : NUMBER;
IDENTIFIER : LETTER+;
DOT : '.';
WHITESPACE : (' ' | '\t')+ {$channel = HIDDEN;};
fragment NUMBER : DIGIT+;
fragment LETTER : 'a' .. 'z';
fragment DIGIT : '0' .. '9';
i can convert float lexer rule and integer lexer rule to parser rules
so i can use syntactic predication.
grammar Sample;
start
options {backtrack = true;}
: (floatLiteral | integerLiteral) DOT IDENTIFIER
;
floatLiteral : NUMBER DOT NUMBER;
integerLiteral : NUMBER;
IDENTIFIER : LETTER+;
DOT : '.';
WHITESPACE : (' ' | '\t')+ {$channel = HIDDEN;};
NUMBER : DIGIT+;
fragment LETTER : 'a' .. 'z';
fragment DIGIT : '0' .. '9';
it parses:
3.hello
and
3.4.hello
but the new problem is :
it parses: 3 . 4 . hello
float literals should be adjacent as in Java grammar.
2008/1/7, Gavin Lambert <antlr at mirality.co.nz>:
> At 16:26 7/01/2008, Mark Volkmann wrote:
> >It should be easy right. Terr already gave the hint that the
> >problem is that it was greedily grabbing the DOT for FLOAT
> >instead of leaving it for the separator between the number
> >and the identifier. Piece of cake? Well I've tried several
> >things I thought would work to no avail.
> >Why in the world doesn't this work?
> [...]
> > backtrack = true; // I shouldn't need this, but I don't think
> it
> >can hurt.
>
> It's not going to help, either. "backtrack = true" has no effect
> on the lexer.
>
> >FLOAT: NUMBER DOT NUMBER;
> >INTEGER: NUMBER;
> >IDENTIFIER: LETTER+;
> >DOT: '.';
> >fragment NUMBER: DIGIT+;
> >fragment LETTER: 'a' .. 'z';
> >fragment DIGIT: '0' .. '9';
>
> This has been discussed to death before. For reasons of
> performance (and some other obscure thing, I think), when
> processing a + loop ANTLR will use k=1 lookahead. Thus when faced
> with the choice between FLOAT and INTEGER, it looks ahead to see
> at least one DIGIT and then says "ok, that's a FLOAT". It doesn't
> look past all the DIGITs to see whether there's a DOT or
> not. (Ter has said he might look into improving this a bit in a
> later version.)
>
> Whenever there's a common prefix in your tokens, you will need to
> combine the rules to remove the ambiguity:
>
> INTEGER
> : NUMBER
> ( /* nothing afterwards */
> | DOT NUMBER { $type = FLOAT; }
> )
> ;
>
>
--
Öğr. Gör. Fırat Küçük
ADAMYO Distance Learning
SAKARYA University / TURKEY
More information about the antlr-interest
mailing list