[antlr-interest] Look-ahead changes when rules lifted up

Paul J. Lucas dude at darkfigure.org
Mon Dec 2 19:22:49 PST 2002


Given:

	elementConstructor
		: XML_TAG_START { pushContentLexer(); } QNAME attributeList
           (xmlEmptyElement | xmlNonEmptyElement) { popLexer(); } ;

	xmlEmptyElement
		: XML_EMPTY_ELEMENT_END ;

	xmlNonEmptyElement
		: XML_END_TAG (elementContent)*
		  XML_END_TAG_START QNAME (S)? XML_TAG_END ;

where:

	XML_TAG_START: '<' ;
	XML_EMPTY_ELEMENT_END: "/>" ;
	XML_END_TAG: '>' ;
	XML_END_TAG_START: "</" ;

with sufficient look-ahead, this doesn't work properly.  When parsing:

	<a></a>, <b/>
	      ^______________here

it needlessly looks-ahead at the ',' even though looking at the '>' alone is sufficient.  The trouble is that the ',' is parsed in the wrong lexer (it hasn't been popped yet).

If, however, I move the popLexer() action code out of the
elementConstructor rule and (redundantly) into the xmlEmptyElement and
xmlNonEmptyElement rules as in:

	xmlEmptyElement
		: XML_EMPTY_ELEMENT_END { popLexer(); } ;

	xmlNonEmptyElement
		: XML_END_TAG (elementContent)*
		  XML_END_TAG_START QNAME (S)? XML_TAG_END { popLexer(); } ;

then everything works fine: it no longer looks ahead past the '>' at
the ',', the lexer is popped, and the previous lexer returns the comma
correctly.

Why should moving the action code make a difference?  This is really
annoying and took dumb-luck and lot of trial-and-error to figure out
how to work around this.

- Paul


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list