[antlr-interest] Lexer: Default token?

Eric eric-public at omnicurious.com
Mon Jul 24 06:37:46 PDT 2006


This is a repost to see if I can get this go match up with the previous
message thread.  Sorry for the extra messages. <blush>

Original message:

Just an update, I ended up using the following solution.  It's definitely
not the fastest way to do things, but it makes maintenance of the grammar
easier.

START
  :
    (WS_)           => WS_          {$setType(WS_);}
  | (C_COMMENT)     => C_COMMENT    {$setType(C_COMMENT);}
  | (CPP_COMMENT)   => CPP_COMMENT  {$setType(CPP_COMMENT);}
  | (IDENTIFIER)    => IDENTIFIER   {$setType(IDENTIFIER);}
  | (INT)           => INT          {$setType(INT);}
  | (LCURLY)        => LCURLY       {$setType(LCURLY);}
  | (RCURLY)        => RCURLY       {$setType(RCURLY);}
  | (LPAREN)        => LPAREN       {$setType(LPAREN);}
  | (RPAREN)        => RPAREN       {$setType(RPAREN);}
  | (LBRACKET)      => LBRACKET     {$setType(LBRACKET);}
  | (RBRACKET)      => RBRACKET     {$setType(RBRACKET);}
  | ANY_CHAR                        {$setType(ANY_CHAR);}
  ;

protected 
ANY_CHAR: .;  

// The remaining token rules go here (and are all protected).


Let me know if people have some better ideas on how to do this.

-Eric



> Date: Sat, 22 Jul 2006 13:53:23 -0600
> From: "Eric Holmberg" <eric at omnicurious.com>
> Subject: [antlr-interest] Lexer:  Default token?
> To: <antlr-interest at antlr.org>
> Message-ID: <000c01c6adc8$7e25f8c0$0a00a8c0 at FASTBRICK>
> Content-Type: text/plain;	charset="us-ascii"
> 
> I have a case where depending upon the context of the parse, I want to
> either parse all tokens or ignore then, but still save them to the AST 
> (hidden tokens are fine).  The problem is that there aren't any rules 
> for the characters defined in the lexer, so the lexer throws a parse 
> exception.
>
> Is there a way, in the lexer, to specify a default token type such
> that if a character/character-sequence doesn't match any of the token 
> rules, then it gets packaged up in this default token type?
>
> I can always do this by hand, but it's tedious and gets tricky for
> rules that match comments, etc.
> 
> -Eric



More information about the antlr-interest mailing list