[antlr-interest] Fwd: A couple of questions for lexing strategy.

Fri Feb 29 19:19:41 PST 2008

---------- Forwarded message ----------
From: ANTLR Mailing List <jstpierre-antlr at mecheye.net>
Date: Feb 29, 2008 9:59 PM
Subject: A couple of questions for lexing strategy.
To: ANTLR Mailing List <jstpierre-antlr at mecheye.net>

Wikipedia defines a lexer as something that contains two parts: a
 scanner and a tokenizer. First, what is each seperately, and does
 ANTLR make this differentiation?

 On that same subject, why does ANTLR use seperate methods in the lexer
 (in parser rules) for each token that could be inlined? Here's an
 example:

 grammar InlineTest;

 program:
        ( FunctionCall Newline )*;

 FunctionCall:
        FunctionName ' (' argumentList ')';

 FunctionName
        :       'sin'
        |       'cos'
        |       'tan';

 ArgumentList:
        ArgumentListItem ( ',' ArgumentListItem )*;

 ArgumentListItem
        :       'unsigned'?
                ( 'float' | 'int' | 'double' | 'single' );

 Whitespace
        :       ( ' ' | '\t' )+ { $channel=HIDDEN };

 Newline
        :       ( '\r'? '\n' )*;

 This produces nice clean code when using lexer rules, inlining all the
 strings used by using matchString in the rules. However, when
 converting these to parser rules, the code becomes very messy, making
 each string a token type. Is this expected behavior for a parser? Is
 this bad judgement on my part for what should be a lexer/parser rule?

 Also, can ANTLR extract the token list in the lexer into it's own
 class? When using the parser rule grammar shown above, the list is
 generated twice, once in each class. It seems that putting the tokens
 in its own file would be much cleaner.

 Also, I still have interest in developing the ActionScript port for
 ANTLR, but I'm a student (read: I'm under 18). Would I still be able
 to develop, because my signature is not legal.

 Thanks guys.