[antlr-interest] Adding a new option to ANTLR "defaultRule", its possible, but would it be well accepted?

Daniel Shane lachinois at hotmail.com
Tue Jun 20 09:41:22 PDT 2006


Hi John!

Yes I really like this solution, its a bit more complex than using a predicate, but its faster since it will obviously never backtrack. If testLiterals was not available, I think the solution would quickly become very complex, but since only a few keywords are not of fixed length I can live with that.

But my question is, is this solution really better than having some degree of control over the ordering of the rules? I mean we are all aware that if it was possible to have some control, we could reduce the size of this simple lexer by half or even more.

I think that in most cases, simply having control over the last rule in the chain is enough power. 

I'm pretty sure it would be dead simple to enhance ANTLR with an option like "defaultRule" kind of option. This option would be general to the lexer, and be something like "defaultRul=STRING". A default rule gets invoked only an only if no other rule can be triggered by the lexer. If you use a rule that is "defaulted" in another rule, the defaultRule option will not change its behavior in any way.

The only time "defaultRule" would change anything, is if the lexer does not match any other rules based on the lookahead it has.

Can anyone find any objection to such an option? I'm really tempted to implement it and submit the changes... although I know we are now in ANTLR v3 so I'm not sure if ANTLR v2 is open for enhancements...

Daniel Shane

> I *REALLY* dislike predicates - altho they are essential in some situations.
> 
> I think even with a predicate you would still need to inspect the lookahead
> character to see if it was a delimiter (e.g. to make "/1a" be a STRING, while
> "/1 " is a N_PROXIMITY).
> 
> It is a failing of mine that I spend *WAY* too much time trying to get rid of
> predicates.  Not always having a good cost-benefit ratio ;-(
> 
> Anyway, how about this lexer without predicates?
> 
> (I assume that " / " is a STRING (no WS), and likewise "/google", "g/g",
> "g*g/g/" are all STRING's and that "/*", "**", "a*b/c*" are all
> PREFIXED_STRINGS)
> 
> -------------------------
> class LuceneLexer extends Lexer;
> 
> tokens {
>     AND = "AND";
>     STRING;
>     PREFIXED_STRING;
>     N_PROXIMITY;
> }
> 
> STRING options{ testLiterals=true; } :
>         ~( '/' | ' ' | '\t' | '\n' | '\r' )
>         ( ~( ' ' | '\t' | '\n' | '\r' ) )*
>         { if ((text.length() > 1) && (text.charAt(text.length()-1) == '*')) {
>             $setType(PREFIXED_STRING);
>             text.setLength(text.length() - 1);
>           }
>         }
> 	;
> 
> N_PROXIMITY :
>         ( '/' { $setType(STRING);} )
>         ( ('0'..'9')+ { $setType(N_PROXIMITY); } )?
> 
>         ( ( /*empty*/ {/* need to strip leading '/' here */} )
> 
>         | ( /*NB: leading '/' should be kept on this path */
>             ~( '0'..'9' | ' ' | '\t' | '\n' | '\r' ) { $setType(STRING); }
>              ( ~( ' ' | '\t' | '\n' | '\r' ) )*
>              { if(text.charAt(text.length()-1)=='*') {
>                  $setType(PREFIXED_STRING);
>                  text.setLength(text.length() - 1);
>                }
>              }
>           )
>         )
>     ;
> 
> WS  : ( ' ' | ('\t' { tab(); }) ) { $setType(Token.SKIP); } ;
> EOL : ( '\r' ( '\n' )? | '\n' ) { newline(); $setType(Token.SKIP); } ;
> -------------------------
> 
> Hope this helps...
>    -jbb

_________________________________________________________________
La messagerie sur votre téléphone portable doit être un jeu d'enfant : essayez Windows Live Mail for Mobile Beta
http://ideas.live.com/programpage.aspx?versionId=6e782662-5f2a-4161-a64a-7f63644e1f0a


More information about the antlr-interest mailing list