[antlr-interest] Forcing the lexer to never error

Jim Idle jimi at temporal-wave.com
Sat Jun 16 00:58:09 PDT 2012

You just want one rule as the last rule:



-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of A Z
Sent: Saturday, June 16, 2012 6:14 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Forcing the lexer to never error

Hello all,

  This is all using ANTLR 3.4 with the C target. I'm trying to modify my
lexer grammar to never trigger a lexer error but instead emit a special
token, INVALID. So far I've done this by adding all invalid sequences of
characters to a special rule INVALID.

ASCOLCOLAS                 : '*::*';

 | '*:'
 | '*::';

This works but it gets tedious for certain complex lexer rules. For
instance the rule for a line directive is as follows:

  'line'  SLSpace+ DecDigits SLSpace+ StrChars SLSpace+ DecDigits SLSpace*

To handle this I'd have to add a fairly complex alternative to the INVALID

 | 'line'
  | SLSpace+
   | DecDigits ...

I also tried adding alternatives to the DIR_LINE rule instead.
Unfortunately ANTLR sometimes fails to generate the code in this case,
even after letting it run for several minutes. I also don't have a way to
set the token type to INVALID. ANTLR places the token type assignment
after any lexer rules actions, overriding my changes.

    | ~DecDigits {LEXSTATE->type = INVALID;} //This gets ignored in the C
  | ~SLSpace {ctx->lineError();}

My first question is, are there performance issues caused by adding the
separate INVALID rule as opposed to alternative in existing rules?
My understanding is yes since lookahead is needed to determine whether
REALNUM or INVALID should be entered, for instance.

Secondly, is there a way to force the token type based on a rule action?


List: http://www.antlr.org/mailman/listinfo/antlr-interest

More information about the antlr-interest mailing list