[antlr-interest] Preventing longest match in the lexer

Kevin J. Cummings cummings at kjchome.homeip.net
Fri Apr 8 17:29:18 PDT 2011


On 04/08/2011 02:52 PM, A Z wrote:
> Hello all,
> 
> Is there a way to force 'match first' among a group of tokens? In the code
> below, if 'undef(' or 'undef ' is found, it matches DIR_MACRO regardless of
> the predicate. I can see why it would do this, but I'm trying to find a way
> to match the DIR_UNDEF rule without resorting to combining the two rules and
> manually modify the token type.

Because both tokens begin with '`', you have few choices here.

Left Factor and combine the rules, or increase the "k" for those rules.
 I suspect your problem is that 'undef' will also match SimpleIdent?
You are looking for '`undef` as a single token.
I would combine the rules...  (I would also LEX as '`' 'undef', but that
is another matter.  Your DIR_UNDEF token contains an awful lot of text,
including whitespace?)

> 
> DIR_UNDEF :
>   '`undef'
>   SLSpace+ var0=SimpleIdent;
> 
> DIR_MACRO :
>   '`' var0=SimpleIdent
>   (
>     {cond1(var0) == true}? =>
>   | {cond2(var0) == true}? => Args
>   | //Both conditionals false
>   );
> 
> fragment Args : ' '* '(' ;

fragment DIR_UNDEF : ;

DIR_MACRO :
      ( '`undef` )=> '`undef' SLSpace+ var0=SimpleIdent
          { $type = DIR_UNDEF; }
   |  '`' var0=SimpleIdent
      ( {cond1(var0) == true}? =>
      | {cond2(var0) == true}? => Args
      | //Both conditionals false
      );

Now, it will try and lex `undef first, then look for ` followed by any
SimpleIdent second.  Order of lexing guarenteed.

-- 
Kevin J. Cummings
kjchome at verizon.net
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)


More information about the antlr-interest mailing list