[antlr-interest] Lexing 7-bit ASCII stream
Gavin Lambert
antlr at mirality.co.nz
Tue Apr 21 04:53:33 PDT 2009
At 21:59 21/04/2009, Avid Trober wrote:
>I'm parsing a 7-bit ASCII stream ... 2 questions
>
>Question 1: can't I just fall-thru wrt to lexer rules, where
>lexer rules are specific-to-general, and avoid indeterminisms at
>run-time?
[...]
>... // (AND IF NOTHING ABOVE MATCHES, AT LEAST WE'RE MATCHING
>HERE ... )
>
>CHAR : '\u0000'..'\u007F' // any 7-bit US-ASCII character
> ;
You can specify a catch-all match like so:
CHAR : .;
If this is the last lexer rule, then it will behave as you're
expecting.
>Question 2: I'm at a loss how to match the notation in the spec
>I'm writing a grammar for where binary digits are '0' or '1' and
>digits are '0'..'9'. (ABNF-ish) It is prefered to make the
>grammar rule names match that (whether lexer or parser, it
>doesn't matter)
Generally, it's best to have the lexer match as wide as possible
(ie. have DIGIT, not BINARY_DIGIT) and sort it out in the parser,
where you can use the context to give better error messages if you
encounter something invalid.
>Can I write a binary_digit parser rule that works with DIGIT
>above somehow?
Yep. Depending on the context, you may want to either use a
lookahead-based entry predicate to avoid entering the rule if the
DIGITs aren't binary-safe, or a exit predicate that raises an
error if it turns out that the sequence wasn't valid binary.
More information about the antlr-interest
mailing list