[antlr-interest] Lexer: Default token?

Mon Jul 24 06:12:06 PDT 2006

Just an update, I ended up using the following solution.  It's definitely
not the fastest way to do things, but it makes maintenance of the grammar
easier.

START
  :
    (WS_)           => WS_          {$setType(WS_);}
  | (C_COMMENT)     => C_COMMENT    {$setType(C_COMMENT);}
  | (CPP_COMMENT)   => CPP_COMMENT  {$setType(CPP_COMMENT);}
  | (IDENTIFIER)    => IDENTIFIER   {$setType(IDENTIFIER);}
  | (INT)           => INT          {$setType(INT);}
  | (LCURLY)        => LCURLY       {$setType(LCURLY);}
  | (RCURLY)        => RCURLY       {$setType(RCURLY);}
  | (LPAREN)        => LPAREN       {$setType(LPAREN);}
  | (RPAREN)        => RPAREN       {$setType(RPAREN);}
  | (LBRACKET)      => LBRACKET     {$setType(LBRACKET);}
  | (RBRACKET)      => RBRACKET     {$setType(RBRACKET);}
  | ANY_CHAR                        {$setType(ANY_CHAR);}
  ;

protected 
ANY_CHAR: .;  

// The remaining token rules go here (and are all protected).

Let me know if people have some better ideas on how to do this.

-Eric

> I have a case where depending upon the context of the parse, I want to
> either parse all tokens or ignore then, but still save them to the AST
> (hidden tokens are fine).  The problem is that there aren't any rules for
> the characters defined in the lexer, so the lexer throws a parse
exception.
>
> Is there a way, in the lexer, to specify a default token type such that if
a
> character/character-sequence doesn't match any of the token rules, then it
> gets packaged up in this default token type?
>
> I can always do this by hand, but it's tedious and gets tricky for rules
> that match comments, etc.
> 
> -Eric