[antlr-interest] a simple (not for me :)) grammar problem

Mon Jan 7 04:12:08 PST 2008

On Jan 7, 2008 2:20 AM, Fırat Küçük <firatkucuk at gmail.com> wrote:
> this is my simple solution:
>
> the original sample grammar:
> grammar Sample;
>
> start           :  (FLOAT | INTEGER) DOT IDENTIFIER;
>
> FLOAT           :  NUMBER DOT NUMBER;
> INTEGER         :  NUMBER;
> IDENTIFIER      :  LETTER+;
> DOT             :  '.';
> WHITESPACE      :  (' ' | '\t')+ {$channel = HIDDEN;};
> fragment NUMBER :  DIGIT+;
> fragment LETTER :  'a' .. 'z';
> fragment DIGIT  :  '0' .. '9';
>
> i can convert float lexer rule and integer lexer rule to parser rules
> so i can use syntactic predication.
>
>
> grammar Sample;
>
> start
> options {backtrack = true;}
>         :  (floatLiteral | integerLiteral) DOT IDENTIFIER
>         ;
>
> floatLiteral    :       NUMBER DOT NUMBER;
> integerLiteral  : NUMBER;
> IDENTIFIER      :  LETTER+;
> DOT             :  '.';
> WHITESPACE      :  (' ' | '\t')+ {$channel = HIDDEN;};
> NUMBER          :  DIGIT+;
> fragment LETTER :  'a' .. 'z';
> fragment DIGIT  :  '0' .. '9';
>
>
> it parses:
> 3.hello
> and
> 3.4.hello
>
> but the new problem is :
>
> it parses:  3   .   4   .  hello
>
> float literals should be adjacent as in Java grammar.

I think you just need to not send space characters to the hidden
channel. That worked for me. Of course that means that for other rules
you're going to have to specify exactly where whitespace is allowed
which may be tedious. Maybe someone will offer a better solution.

> 2008/1/7, Gavin Lambert <antlr at mirality.co.nz>:
> > At 16:26 7/01/2008, Mark Volkmann wrote:
> >  >It should be easy right. Terr already gave the hint that the
> >  >problem is that it was greedily grabbing the DOT for FLOAT
> >  >instead of leaving it for the separator between the number
> >  >and the identifier. Piece of cake? Well I've tried several
> >  >things I thought would work to no avail.
> >  >Why in the world doesn't this work?
> > [...]
> >  > backtrack = true; // I shouldn't need this, but I don't think
> > it
> >  >can hurt.
> >
> > It's not going to help, either.  "backtrack = true" has no effect
> > on the lexer.
> >
> >  >FLOAT: NUMBER DOT NUMBER;
> >  >INTEGER: NUMBER;
> >  >IDENTIFIER: LETTER+;
> >  >DOT: '.';
> >  >fragment NUMBER: DIGIT+;
> >  >fragment LETTER: 'a' .. 'z';
> >  >fragment DIGIT: '0' .. '9';
> >
> > This has been discussed to death before.  For reasons of
> > performance (and some other obscure thing, I think), when
> > processing a + loop ANTLR will use k=1 lookahead.  Thus when faced
> > with the choice between FLOAT and INTEGER, it looks ahead to see
> > at least one DIGIT and then says "ok, that's a FLOAT".  It doesn't
> > look past all the DIGITs to see whether there's a DOT or
> > not.  (Ter has said he might look into improving this a bit in a
> > later version.)
> >
> > Whenever there's a common prefix in your tokens, you will need to
> > combine the rules to remove the ambiguity:
> >
> > INTEGER
> >    : NUMBER
> >      ( /* nothing afterwards */
> >      | DOT NUMBER { $type = FLOAT; }
> >      )
> >    ;
> >
> >
>
>
>
> --
> Öğr. Gör. Fırat Küçük
> ADAMYO Distance Learning
> SAKARYA University / TURKEY
>

-- 
R. Mark Volkmann
Object Computing, Inc.