[antlr-interest] Lexer Rule Ordering, how to obtain a default token rule??

John B. Brodie jbb at acm.org
Mon Jun 19 15:28:13 PDT 2006


on Mon, 19 Jun 2006 16:32:36, Daniel Shane asked:
>Hi!

Greetings!

>I'm writing a lexer for a new Lucene query parser, and I thought of giving
>ANTLR a try with my project. However, I'm faced with a problem I cant seem to
>resolve...
>
>To make the problem simple, imagine that you have 4 types of tokens :
>
>  a) AND (matches the string "AND")
>  b) PREFIXED_STRING (matches any string ending with *, i.e. google*)
>  c) STRING (anything that is separated by WS and is not one of the above)
>
>...other info, including a complex trial lexer, snipped...

(the 4 tokens are AND, STRING, PREFIXED_STRING, and WS; where WS is to be
ignored, right?)

I do not think that Antlr has the concept of a default token.

However, in this case, your reserved word - "AND" - is matched by your general
pattern for STRING; so you are good to go for the use of the testLiterals
option. 

Well maybe testLiterals can be thought of as a default token rule but with a
twist; e.g. first match the general string (or identifier) pattern and then
see if that result should be specialized into one of the reserved words.
rather than trying all the special case reserved words first and then
supplying a default as the result when they all fail.

Anyway, does this Lexer do what you need?

-------------------------
class LuceneLexer extends Lexer;

tokens {
    AND = "AND";
    STRING;
    PREFIXED_STRING;
}

STRING options{ testLiterals=true; } :
    ( ~( '*' | ' ' | '\t' | '\n' | '\r' ) )+
    ( '*' { $setType(PREFIXED_STRING); text.setLength(text.length() - 1); } )?
    ;

WS  : ( ' ' | ('\t' { tab(); }) ) { $setType(Token.SKIP); } ;
EOL : ( '\r' ( '\n' )? | '\n' ) { newline(); $setType(Token.SKIP); } ;

-------------------------

Note: you did not say how the input strings "a * b" or "c*d" should be
handled, so the above Lexer probably does not do the Right Thing on those
inputs.

Hope this helps...
   -jbb


More information about the antlr-interest mailing list