[antlr-interest] Lexer Rule Ordering,
how to obtain a default token rule??
John B. Brodie
jbb at acm.org
Tue Jun 20 09:19:05 PDT 2006
I *REALLY* dislike predicates - altho they are essential in some situations.
I think even with a predicate you would still need to inspect the lookahead
character to see if it was a delimiter (e.g. to make "/1a" be a STRING, while
"/1 " is a N_PROXIMITY).
It is a failing of mine that I spend *WAY* too much time trying to get rid of
predicates. Not always having a good cost-benefit ratio ;-(
Anyway, how about this lexer without predicates?
(I assume that " / " is a STRING (no WS), and likewise "/google", "g/g",
"g*g/g/" are all STRING's and that "/*", "**", "a*b/c*" are all
PREFIXED_STRINGS)
-------------------------
class LuceneLexer extends Lexer;
tokens {
AND = "AND";
STRING;
PREFIXED_STRING;
N_PROXIMITY;
}
STRING options{ testLiterals=true; } :
~( '/' | ' ' | '\t' | '\n' | '\r' )
( ~( ' ' | '\t' | '\n' | '\r' ) )*
{ if ((text.length() > 1) && (text.charAt(text.length()-1) == '*')) {
$setType(PREFIXED_STRING);
text.setLength(text.length() - 1);
}
}
;
N_PROXIMITY :
( '/' { $setType(STRING);} )
( ('0'..'9')+ { $setType(N_PROXIMITY); } )?
( ( /*empty*/ {/* need to strip leading '/' here */} )
| ( /*NB: leading '/' should be kept on this path */
~( '0'..'9' | ' ' | '\t' | '\n' | '\r' ) { $setType(STRING); }
( ~( ' ' | '\t' | '\n' | '\r' ) )*
{ if(text.charAt(text.length()-1)=='*') {
$setType(PREFIXED_STRING);
text.setLength(text.length() - 1);
}
}
)
)
;
WS : ( ' ' | ('\t' { tab(); }) ) { $setType(Token.SKIP); } ;
EOL : ( '\r' ( '\n' )? | '\n' ) { newline(); $setType(Token.SKIP); } ;
-------------------------
Hope this helps...
-jbb
More information about the antlr-interest
mailing list