[antlr-interest] Lexing 7-bit ASCII stream

Gavin Lambert antlr at mirality.co.nz
Tue Apr 21 04:53:33 PDT 2009


At 21:59 21/04/2009, Avid Trober wrote:
>I'm parsing a 7-bit ASCII stream ... 2 questions
>
>Question 1: can't I just fall-thru wrt to lexer rules, where 
>lexer rules are specific-to-general, and avoid indeterminisms at 
>run-time?
[...]
>... // (AND IF NOTHING ABOVE MATCHES, AT LEAST WE'RE MATCHING 
>HERE ... )
>
>CHAR    : '\u0000'..'\u007F'  // any 7-bit US-ASCII character
>              ;

You can specify a catch-all match like so:

   CHAR : .;

If this is the last lexer rule, then it will behave as you're 
expecting.

>Question 2: I'm at a loss how to match the notation in the spec 
>I'm writing a grammar for where binary digits are '0' or '1'  and 
>digits are '0'..'9'.  (ABNF-ish)  It is prefered to make the 
>grammar rule names match that (whether lexer or parser, it 
>doesn't matter)

Generally, it's best to have the lexer match as wide as possible 
(ie. have DIGIT, not BINARY_DIGIT) and sort it out in the parser, 
where you can use the context to give better error messages if you 
encounter something invalid.

>Can I write a binary_digit parser rule that works with DIGIT 
>above somehow?

Yep.  Depending on the context, you may want to either use a 
lookahead-based entry predicate to avoid entering the rule if the 
DIGITs aren't binary-safe, or a exit predicate that raises an 
error if it turns out that the sequence wasn't valid binary.



More information about the antlr-interest mailing list