[antlr-interest] grammar newbie: "escaping" tokens
daniele fusi
dafusi at gmail.com
Sat Dec 29 06:19:10 PST 2007
Hi list, sorry for the naive question but I'm just beginning with this
wonderful tool and experimenting a bit. and I cannot find simple
examples in ANTL3. I've created a simple (tree) grammar representing a
dummy query language, where terms are bound with boolean operators (&
| ! for and, or, not) and grouped with parentheses (). Also, a term
can include a single wildcard * or ? at its beginning or end, or be
followed by a tilde plus a floating point number (indicating a fuzzy
match treshold). Until then I (think so) managed to define this simple
grammar, but now I'd like to add another option, i.e. using a full
regular expression as a query term, enclosed in some delimiter
character (say #). So, I'd like to have queries like:
term1 & (term2 | #regexforterm3#)
of course, regexes can include characters I have reserved for other
purposes in this grammar, like parentheses or the like. I'd simply
want to tell the parser that whatever it finds in #...# should be
treated as a term, so that it does not misinterpret characters like
()?* etc. I've tried to define a new token like
'#'!(LETTER|REGEXMETA)+'#'!
but this does not seem to work in every case. Here's my sample grammar
definition, could anyone give a hint?
Thanx!
========================================
grammar QueryTree;
...
query : expr EOF;
expr : andExpr (OR^ andExpr)*;
andExpr : notExpr (AND^ notExpr)*;
notExpr : NOT^? term;
term : TOKEN
| '('!expr')'!;
// LEXICALS
AND : '&';
OR : '|';
NOT : '!';
WS : (' '|'\t'|'\n'|'\r')+ {$channel=HIDDEN;};
TOKEN : LETTER+
| '#'!(LETTER|REGEXMETA)+'#'!
| LETTER+ WILDCARD
| WILDCARD LETTER+
| LETTER+ FUZZY
;
fragment
LETTER : 'a'..'z'|'A'..'Z'|'\''|
'\u0391'..'\u03A9'|'\u03B1'..'\u03C9'|
'\u03DC';
fragment
WILDCARD: '*'|'?';
fragment
DIGIT : '0'..'9';
fragment
FUZZY : '~'(DIGIT+('.'DIGIT+)?)?;
fragment
REGEXMETA
: ('['|']'|'\\'|'^' | '$' | '.' | '|' | '?' | '*' | '+' | '(' | ')');
========================================
More information about the antlr-interest
mailing list