[antlr-interest] Forcing the lexer to never error

Jim Idle jimi at temporal-wave.com
Sun Jun 17 03:11:22 PDT 2012


Then it means that your lexer rules are not able to cope with the
unexpected input. You need to code your rules so that they can accept bad
input and report it, or perhaps override the error handler and take some
kind of recovery action.

Jim

-----Original Message-----
From: A Z [mailto:asicaddress at gmail.com]
Sent: Saturday, June 16, 2012 10:36 PM
To: Jim Idle
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Forcing the lexer to never error

Thanks for the response.

I tried this but it doesn't give the expected behavior. The lexer still
generates exceptions for certain character sequences.

On 6/16/12, Jim Idle <jimi at temporal-wave.com> wrote:
> You just want one rule as the last rule:
>
> INVALID : . ;
>
> Jim
>
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of A Z
> Sent: Saturday, June 16, 2012 6:14 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Forcing the lexer to never error
>
> Hello all,
>
>   This is all using ANTLR 3.4 with the C target. I'm trying to modify
> my lexer grammar to never trigger a lexer error but instead emit a
> special token, INVALID. So far I've done this by adding all invalid
> sequences of characters to a special rule INVALID.
>
>
> ASCOLCOLAS                 : '*::*';
>
> INVALID :
>  ...
>  | '*:'
>  | '*::';
>
> This works but it gets tedious for certain complex lexer rules. For
> instance the rule for a line directive is as follows:
>
> DIR_LINE :
>   'line'  SLSpace+ DecDigits SLSpace+ StrChars SLSpace+ DecDigits
> SLSpace* '\n'
>
> To handle this I'd have to add a fairly complex alternative to the
> INVALID rule
>
> INVALID:
>   ...
>  | 'line'
>   (
>     ~SLSpace
>   | SLSpace+
>    (
>      ~DecDigits
>    | DecDigits ...
>    )
>   )
>
> I also tried adding alternatives to the DIR_LINE rule instead.
> Unfortunately ANTLR sometimes fails to generate the code in this case,
> even after letting it run for several minutes. I also don't have a way
> to set the token type to INVALID. ANTLR places the token type
> assignment after any lexer rules actions, overriding my changes.
>
> DIR_LINE :
>   'line'
>   (
>     SLSpace+
>     (
>       DecDigits
>       (
>        ...
>       |
>       )
>     | ~DecDigits {LEXSTATE->type = INVALID;} //This gets ignored in
> the C code
>     )
>   | ~SLSpace {ctx->lineError();}
>   )
>
>
>
> My first question is, are there performance issues caused by adding
> the separate INVALID rule as opposed to alternative in existing rules?
> My understanding is yes since lookahead is needed to determine whether
> REALNUM or INVALID should be entered, for instance.
>
> Secondly, is there a way to force the token type based on a rule action?
>
> Thanks
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list