[antlr-interest] Lexing 7-bit ASCII stream

Avid Trober avidtrober at gmail.com
Tue Apr 21 02:59:15 PDT 2009


I'm parsing a 7-bit ASCII stream ... 2 questions

Question 1: can't I just fall-thru wrt to lexer rules, where lexer rules are specific-to-general, and avoid indeterminisms at run-time? 
For example:

NULL    : '\u0000' 
 ; 
SOH    : '\u0001'
 ;

... // (EACH CONTROL CHARCTER HAS ITS OWN LEXER RULE)


HTAB    : '\u0009'  // horizontal tab
    ;
LF    : '\u000A'   // carriage return
    ;    
CR    : '\u000D'   // carriage return
    ; 

SP         : '\u0020'   // SPACE
    ;    
DQUOTE     : '\u0022'  // (Double Quote)
    ;
DIGIT      : '\u0030'..'\u0039' // 0-9
    ;        

... // (THEN I WANT TO DENOTE RANGES ... )

UPPER_CASE : '\u0041'..'\u005A' // A..Z
    ;
TWEEN_CASE : '\u005B'..'\u0060' 
    ; 
LOWER_CASE : '\u0061'..'\u007a'  // a..z
    ;        

... // (AND IF NOTHING ABOVE MATCHES, AT LEAST WE'RE MATCHING HERE ... )

CHAR    : '\u0000'..'\u007F'  // any 7-bit US-ASCII character
             ;


Question 2: I'm at a loss how to match the notation in the spec I'm writing a grammar for where binary digits are '0' or '1'  and digits are '0'..'9'.  (ABNF-ish)  It is prefered to make the grammar rule names match that (whether lexer or parser, it doesn't matter)

Can I write a binary_digit parser rule that works with DIGIT above somehow?  


Thanks much for any help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090421/2d25183d/attachment.html 


More information about the antlr-interest mailing list