[antlr-interest] grammar newbie: "escaping" tokens

daniele fusi dafusi at gmail.com
Sat Dec 29 06:19:10 PST 2007


Hi list, sorry for the naive question but I'm just beginning with this
wonderful tool and experimenting a bit. and I cannot find simple
examples in ANTL3. I've created a simple (tree) grammar representing a
dummy query language, where terms are bound with boolean operators (&
| ! for and, or, not) and grouped with parentheses (). Also, a term
can include a single wildcard * or ? at its beginning or end, or be
followed by a tilde plus a floating point number (indicating a fuzzy
match treshold). Until then I (think so) managed to define this simple
grammar, but now I'd like to add another option, i.e. using a full
regular expression as a query term, enclosed in some delimiter
character (say #). So, I'd like to have queries like:

term1 & (term2 | #regexforterm3#)

of course, regexes can include characters I have reserved for other
purposes in this grammar, like parentheses or the like. I'd simply
want to tell the parser that whatever it finds in #...# should be
treated as a term, so that it does not misinterpret characters like
()?* etc. I've tried to define a new token like

'#'!(LETTER|REGEXMETA)+'#'!

but this does not seem to work in every case. Here's my sample grammar
definition, could anyone give a hint?
Thanx!

========================================
grammar QueryTree;
...
query	:	expr EOF;
expr	:	andExpr (OR^ andExpr)*;
andExpr	:	notExpr (AND^ notExpr)*;
notExpr	:	NOT^? term;
term	:	TOKEN
	|	'('!expr')'!;

// LEXICALS
AND	:	'&';
OR	:	'|';
NOT	:	'!';
WS	:	(' '|'\t'|'\n'|'\r')+ {$channel=HIDDEN;};
TOKEN	:	LETTER+
	|	'#'!(LETTER|REGEXMETA)+'#'!
	| 	LETTER+ WILDCARD
	|	WILDCARD LETTER+
	|	LETTER+ FUZZY
;
fragment
LETTER	:	'a'..'z'|'A'..'Z'|'\''|
		'\u0391'..'\u03A9'|'\u03B1'..'\u03C9'|
		'\u03DC';
fragment
WILDCARD:	'*'|'?';
fragment
DIGIT	:	'0'..'9';
fragment
FUZZY	:	'~'(DIGIT+('.'DIGIT+)?)?;
fragment
REGEXMETA
	:	('['|']'|'\\'|'^' | '$' | '.' | '|' | '?' | '*' | '+' | '(' | ')');
========================================


More information about the antlr-interest mailing list