[antlr-interest] Can Antlr use a variable in a lexer pattern?

Jim Idle jimi at temporal-wave.com
Fri Feb 25 09:39:51 PST 2011


Just use action code.


However you are committing the common error of trying to enforce things at
too low a level. You should let the \ escape any character at all, then
let the parser produce a tree then when walking the tree look through
strings and validate. This will give you:

Invalid escape sequence '\g', can be '\m' or '\n' ...

Whereas a lexer fail gives:

Unexpected character 'g'

And your users will have no idea why. Also the lex will fail so the parse
won't run and so syntax errors and validation errors won't get reported.
So, for the sake of one mistyped character the whole tool chain will
abort. Always push the errors as far down the chain as you can, preferably
to the semantic phase if technically possible. Basically a lexer should
never fail if at all possible, even if it is just because the last rule
is:

BAD : . { error(UNKNOWN_CHARACTER, $text); skip(); } ;

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Douglas Godfrey
> Sent: Friday, February 25, 2011 8:34 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Can Antlr use a variable in a lexer pattern?
>
> in the snippet below, can "escape_character" be a variable?
> it seems that this would not work because the "escape_character" is not
> known until it is too late.
> the alternate form below might work if the Antlr Lexer can use a
> variable in the pattern match.
> can the lexer apply the escape character as a post processing
> validation step?
>     i.e. accept anything within the quotes and then validate the
> sequence after the ESCAPE clause?
>
> Unicode_Identifier  =
>         U Ampersand
>         Double_Quote  ( Unicode_Identifier_Part )+ Double_Quote
>         ( ESCAPE escape_character )?
>         ;
>
>
> Alternate form:
>
> Unicode_Identifier  =
>         U Ampersand
>         ( ESCAPE escape_character )?
>         Double_Quote  ( Unicode_Identifier_Part )+ Double_Quote
>         ;
>
>
> fragment
> Unicode_Identifier_Part  = Unicode_Permitted_Identifier_Character  |
> Unicode_Escape_Value ;
>
> fragment
> Unicode_Escape_Value  = Unicode_4_Digit_Escape_Value  |
> Unicode_6_Digit_Escape_Value ;
>
> fragment
> Unicode_4_Digit_Escape_Value  = escape_character  Hexit  Hexit  Hexit
> Hexit ;
>
> fragment
> Unicode_6_Digit_Escape_Value  = escape_character  Plus_Sign Hexit
> Hexit Hexit  Hexit  Hexit  Hexit ;
>
> escape_character            = Back_Slash /*!! See the Syntax Rules*/; ;
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list