[antlr-interest] Parsing Best Practices---Where to check for predefined names

Gavin Lambert antlr at mirality.co.nz
Tue May 5 04:52:59 PDT 2009


At 13:12 5/05/2009, Matthew M. Burke wrote:
 >1) I could go with something that matches '<' ID expr* '>' and 
then
 >in the parser action, I can test ID.text and act as appropriate
 >
 >or
 >
 >2) I could do something like
 >
 >lhs
 >   : '<' 'array' expr '>' ->  ^(ARRAY_REF expr)
 >   | '<' 'socket' expr '>' -> ^(SOCKET_REF expr)
 >   | ...
 >   ;
 >
 >Is either alternative especially better than the other?

In general, option #2 is more efficient -- but you need to bear in 
mind that it'll introduce new top-level lexer tokens, and thus 
"array" will always be treated as a single token (with an obscure 
generated name), not as an ID or some other token.  So if "array" 
is not always a keyword in your language then you'll need a bit 
more intelligence in your identifier-matching (eg. id : ID | 
'array' | 'socket';) or go with option #1 instead.

(If you want to avoid the obscurely-named tokens, then you should 
avoid using quoted constants in parser rules and just create the 
corresponding lexer rules yourself.)



More information about the antlr-interest mailing list