[antlr-interest] Date matching instead of dot pattern

Mon Dec 28 16:41:34 PST 2009

Ben,

You need a lexer rule that gobbles up the entire quoted string a single token. Removing the quotes from the result can be done either in the lexer rul or in the parser rule that builds the AST node.

--John

-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Ben Dotte
Sent: Monday, December 28, 2009 4:50 PM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Date matching instead of dot pattern

Hi,

I'm trying to troubleshoot why an input is matching to a lexer rule instead of a dot pattern in the parser and could use some help. The grammar is being used to interpret user-entered searches, and the idea is that a search surrounded by double quotes should be interpreted as-is. The dot pattern I'm using has worked for everything I have come across so far, until someone pointed this search out to me:

"3/4 Abstract w/Talent"

The AST tree I'm given back by this is a " node with (Abstract w / Talent), as if the "3/4" part were never entered. If I get rid of my DATE lexer rule and associated parser rules, it works fine.

Here is a snippet of the parser rules:

negationSearch
	:	('-'^)? (quotedSearch | dateRangeSearch | comparisonSearch |
idSearch | wildcardSearch | term)
	;

wildcardSearch
	:	TEXT_WITH_WILDCARD	-> ^(WILDCARD TEXT_WITH_WILDCARD)
	;

idSearch
	:	'#'^ TEXT
	;

comparisonSearch
	:	'>'^ TEXT
	|	'<'^ TEXT
	;

quotedSearch
	:	// within double quotes, output whitespace to default channel
(don't ignore whitespace, in other words)
		{ ((SwitchingCommonTokenStream)input).setTokenTypeChannel(
WHITESPACE, Token.DEFAULT_CHANNEL ); }
		'"'^
		.+ // non-greedy by default
		{ ((SwitchingCommonTokenStream)input).setTokenTypeChannel(
WHITESPACE, Token.HIDDEN_CHANNEL ); }
 		'"'!
	;

dateRangeSearch
	:	'[' DATE TO DATE ']'	-> ^(DATE_BETWEEN DATE+)
	|	'[' AFTER DATE ']'	-> ^(DATE_AFTER DATE)
	|	'[' BEFORE DATE ']'	-> ^(DATE_BEFORE DATE)
	;

subSearch
	:	'('! orSearch ')'!
	;

term	:	SEPARATOR* (t=anyText	-> $t)
		(SEPARATOR t2=anyText	-> ^(AND $term $t2))*
		SEPARATOR*
	;

anyText	:	(TO | AFTER | BEFORE | DATE | TEXT)
	;

The related lexer rules look like this:

fragment NUM
	:	('0'..'9') ;
DATE	:	('0'..'1')? NUM '/' ('0'..'3')? NUM '/' NUM NUM NUM NUM ;

I would expect the dot in quotedSearch to match to "3/4", rather than the DATE lexer rule matching to it, since I am already inside the double quotes. Is there something I might be able to do to fix this?
(I'm using antlr 3.1.2.)

Thanks,
Ben

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

[CONFIDENTIALITY AND PRIVACY NOTICE]

Information transmitted by this email is proprietary to Medtronic and is intended for use only by the individual or entity to which it is addressed, and may contain information that is private, privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please delete this mail from your records.

To view this notice in other languages you can either select the following link or manually copy and paste the link into the address bar of a web browser: http://emaildisclaimer.medtronic.com