[antlr-interest] Lexing 7-bit ASCII stream

Sam Barnett-Cormack s.barnett-cormack at lancaster.ac.uk
Wed Apr 22 01:48:49 PDT 2009


Avid Trober wrote:
> thanks.
> org.antlr.Tool is happy with these two, regardless of which one is 
> above/below the other.
> But, won't the DFA's care about the order???
> 
> DQUOTE : '"' ;
> DQUOTE_STRING :  DQUOTE ( ~('"') )* DQUOTE

Do you ever want to generate DQUOTE tokens like that? Could you consider 
making DQUOTE a fragment rule?

Sam

> ----- Original Message ----- 
> From: "Gavin Lambert" <antlr at mirality.co.nz>
> To: "Avid Trober" <avidtrober at gmail.com>; <antlr-interest at antlr.org>
> Sent: Tuesday, April 21, 2009 6:53 AM
> Subject: Re: [antlr-interest] Lexing 7-bit ASCII stream
> 
> 
>> At 21:59 21/04/2009, Avid Trober wrote:
>>> I'm parsing a 7-bit ASCII stream ... 2 questions
>>>
>>> Question 1: can't I just fall-thru wrt to lexer rules, where lexer rules 
>>> are specific-to-general, and avoid indeterminisms at run-time?
>> [...]
>>> ... // (AND IF NOTHING ABOVE MATCHES, AT LEAST WE'RE MATCHING HERE ... )
>>>
>>> CHAR    : '\u0000'..'\u007F'  // any 7-bit US-ASCII character
>>>              ;
>> You can specify a catch-all match like so:
>>
>>   CHAR : .;
>>
>> If this is the last lexer rule, then it will behave as you're expecting.
>>
>>> Question 2: I'm at a loss how to match the notation in the spec I'm 
>>> writing a grammar for where binary digits are '0' or '1'  and digits are 
>>> '0'..'9'.  (ABNF-ish)  It is prefered to make the grammar rule names match 
>>> that (whether lexer or parser, it doesn't matter)
>> Generally, it's best to have the lexer match as wide as possible (ie. have 
>> DIGIT, not BINARY_DIGIT) and sort it out in the parser, where you can use 
>> the context to give better error messages if you encounter something 
>> invalid.
>>
>>> Can I write a binary_digit parser rule that works with DIGIT above 
>>> somehow?
>> Yep.  Depending on the context, you may want to either use a 
>> lookahead-based entry predicate to avoid entering the rule if the DIGITs 
>> aren't binary-safe, or a exit predicate that raises an error if it turns 
>> out that the sequence wasn't valid binary.
>>
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


-- 
Sam Barnett-Cormack


More information about the antlr-interest mailing list