[antlr-interest] Lexer: Default token?
Eric
eric-public at omnicurious.com
Mon Jul 24 06:37:46 PDT 2006
This is a repost to see if I can get this go match up with the previous
message thread. Sorry for the extra messages. <blush>
Original message:
Just an update, I ended up using the following solution. It's definitely
not the fastest way to do things, but it makes maintenance of the grammar
easier.
START
:
(WS_) => WS_ {$setType(WS_);}
| (C_COMMENT) => C_COMMENT {$setType(C_COMMENT);}
| (CPP_COMMENT) => CPP_COMMENT {$setType(CPP_COMMENT);}
| (IDENTIFIER) => IDENTIFIER {$setType(IDENTIFIER);}
| (INT) => INT {$setType(INT);}
| (LCURLY) => LCURLY {$setType(LCURLY);}
| (RCURLY) => RCURLY {$setType(RCURLY);}
| (LPAREN) => LPAREN {$setType(LPAREN);}
| (RPAREN) => RPAREN {$setType(RPAREN);}
| (LBRACKET) => LBRACKET {$setType(LBRACKET);}
| (RBRACKET) => RBRACKET {$setType(RBRACKET);}
| ANY_CHAR {$setType(ANY_CHAR);}
;
protected
ANY_CHAR: .;
// The remaining token rules go here (and are all protected).
Let me know if people have some better ideas on how to do this.
-Eric
> Date: Sat, 22 Jul 2006 13:53:23 -0600
> From: "Eric Holmberg" <eric at omnicurious.com>
> Subject: [antlr-interest] Lexer: Default token?
> To: <antlr-interest at antlr.org>
> Message-ID: <000c01c6adc8$7e25f8c0$0a00a8c0 at FASTBRICK>
> Content-Type: text/plain; charset="us-ascii"
>
> I have a case where depending upon the context of the parse, I want to
> either parse all tokens or ignore then, but still save them to the AST
> (hidden tokens are fine). The problem is that there aren't any rules
> for the characters defined in the lexer, so the lexer throws a parse
> exception.
>
> Is there a way, in the lexer, to specify a default token type such
> that if a character/character-sequence doesn't match any of the token
> rules, then it gets packaged up in this default token type?
>
> I can always do this by hand, but it's tedious and gets tricky for
> rules that match comments, etc.
>
> -Eric
More information about the antlr-interest
mailing list