[antlr-interest] Not consuming lexer tokens

Thomas Brandon tbrandonau at gmail.com
Tue Aug 26 06:23:32 PDT 2008


On Tue, Aug 26, 2008 at 10:29 PM, Uwe Lammers
<Uwe.Lammers at sciops.esa.int> wrote:
> Hi,
>
> I have recently discovered antlr ... what a wonderful tool!
>
> Newbie question (did not find it addressed in the Wiki FAQ though):
> I have been working with flex/bison in the past and want to convert an
> existing grammar to antlr. Have managed ok so far but can't see how to
> do the
> following: In flex, a symbol can be marked as belonging to a token but it
> is actually not consumed by the lexer, e.g.
>
> Lexer rule
> abc/(        { return AFUNC; }
>
> Parser rule
> arith:
> AFUNC '(' arith ')'    { ... }
>
> So, an input like 'abc(...)' is matched by the parser rule arith because
> the first opening parenthesis was not consumed by the lexer - the preceding
> '/' does this.
>
> Is there an equivalent construct in antlr?
>
> thanks for any answer
> Uwe
>
Not directly. Though you can use predicates to accomplish this. Something like:
FUNC: 'abc' { LA(1) == '(' }?;
will ensure a '(' follows without consuming it.
However ANTLR doesn't seem to hoist predicates on the right edge into
it's predictor. So while
FUNC:	{ LA(4) == '(' }? 'abc';
ID	:	'abc';
works, something like:
FUNC:	'abc' { LA(1) == '(' }?;
ID	:	'abc';
causes an error as ID can't match anything. This looks to be a possible bug.
If you want something like this you can instead do:
ID:
    'abc'
    ( ('(')=> {$type = FUNC;} )?
    ;
fragment
FUNC:	'abc' ;

The FUNC rule as a fragment will never match and it doesn't matter
what it's content is, it is just used to define the token type.

Tom.


More information about the antlr-interest mailing list