[antlr-interest] Preventing longest match in the lexer
Kevin J. Cummings
cummings at kjchome.homeip.net
Fri Apr 8 17:29:18 PDT 2011
On 04/08/2011 02:52 PM, A Z wrote:
> Hello all,
>
> Is there a way to force 'match first' among a group of tokens? In the code
> below, if 'undef(' or 'undef ' is found, it matches DIR_MACRO regardless of
> the predicate. I can see why it would do this, but I'm trying to find a way
> to match the DIR_UNDEF rule without resorting to combining the two rules and
> manually modify the token type.
Because both tokens begin with '`', you have few choices here.
Left Factor and combine the rules, or increase the "k" for those rules.
I suspect your problem is that 'undef' will also match SimpleIdent?
You are looking for '`undef` as a single token.
I would combine the rules... (I would also LEX as '`' 'undef', but that
is another matter. Your DIR_UNDEF token contains an awful lot of text,
including whitespace?)
>
> DIR_UNDEF :
> '`undef'
> SLSpace+ var0=SimpleIdent;
>
> DIR_MACRO :
> '`' var0=SimpleIdent
> (
> {cond1(var0) == true}? =>
> | {cond2(var0) == true}? => Args
> | //Both conditionals false
> );
>
> fragment Args : ' '* '(' ;
fragment DIR_UNDEF : ;
DIR_MACRO :
( '`undef` )=> '`undef' SLSpace+ var0=SimpleIdent
{ $type = DIR_UNDEF; }
| '`' var0=SimpleIdent
( {cond1(var0) == true}? =>
| {cond2(var0) == true}? => Args
| //Both conditionals false
);
Now, it will try and lex `undef first, then look for ` followed by any
SimpleIdent second. Order of lexing guarenteed.
--
Kevin J. Cummings
kjchome at verizon.net
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)
More information about the antlr-interest
mailing list