[antlr-interest] Lexer: Default token?
Eric
eric-public at omnicurious.com
Mon Jul 24 06:12:06 PDT 2006
Just an update, I ended up using the following solution. It's definitely
not the fastest way to do things, but it makes maintenance of the grammar
easier.
START
:
(WS_) => WS_ {$setType(WS_);}
| (C_COMMENT) => C_COMMENT {$setType(C_COMMENT);}
| (CPP_COMMENT) => CPP_COMMENT {$setType(CPP_COMMENT);}
| (IDENTIFIER) => IDENTIFIER {$setType(IDENTIFIER);}
| (INT) => INT {$setType(INT);}
| (LCURLY) => LCURLY {$setType(LCURLY);}
| (RCURLY) => RCURLY {$setType(RCURLY);}
| (LPAREN) => LPAREN {$setType(LPAREN);}
| (RPAREN) => RPAREN {$setType(RPAREN);}
| (LBRACKET) => LBRACKET {$setType(LBRACKET);}
| (RBRACKET) => RBRACKET {$setType(RBRACKET);}
| ANY_CHAR {$setType(ANY_CHAR);}
;
protected
ANY_CHAR: .;
// The remaining token rules go here (and are all protected).
Let me know if people have some better ideas on how to do this.
-Eric
> I have a case where depending upon the context of the parse, I want to
> either parse all tokens or ignore then, but still save them to the AST
> (hidden tokens are fine). The problem is that there aren't any rules for
> the characters defined in the lexer, so the lexer throws a parse
exception.
>
> Is there a way, in the lexer, to specify a default token type such that if
a
> character/character-sequence doesn't match any of the token rules, then it
> gets packaged up in this default token type?
>
> I can always do this by hand, but it's tedious and gets tricky for rules
> that match comments, etc.
>
> -Eric
More information about the antlr-interest
mailing list