[antlr-interest] v4 bug: &x and &~x are including match in token

Peter Boughton boughtonp at gmail.com
Sat Jan 21 11:26:50 PST 2012


My understanding of the & operator is intended to act as a lookahead -
ensuring the following content matches, but not including it in the
token text.
( as descibed here:
http://www.antlr.org/wiki/display/~admin/ANTLR+v4+lexers#ANTLRv4lexers-Requirements
)

However, this is not the behaviour I'm seeing - I'm getting the
lookahead match text included as part of the token (which prevents it
from being included in the next token, and thus causes problems).

OUT_ATTR_ENABLE_OUTPUT
	: 'output' WS* EQUALS WS* ATTR_TRUE
	| 'output' WS+ &~'='
	| 'output' &'>'
	{ OutputEnabled = true; }
	;

Sample input:
	<cffunction output> #Special#  </cffunction>
	<cffunction output > #Special#  </cffunction>
	<cffunction output anotherattr > #Special#  </cffunction>

Captured token:
	OUT_ATTR_ENABLE_OUTPUT = [output>]
	OUT_ATTR_ENABLE_OUTPUT = [output >]
	OUT_ATTR_ENABLE_OUTPUT = [output a]


I have used &~x in other situations and it seemed to work, although
maybe they were just ones where it didn't matter when the lookahead
match was included.


More information about the antlr-interest mailing list