[antlr-interest] Look-ahead changes when rules lifted up
Paul J. Lucas
dude at darkfigure.org
Mon Dec 2 19:22:49 PST 2002
Given:
elementConstructor
: XML_TAG_START { pushContentLexer(); } QNAME attributeList
(xmlEmptyElement | xmlNonEmptyElement) { popLexer(); } ;
xmlEmptyElement
: XML_EMPTY_ELEMENT_END ;
xmlNonEmptyElement
: XML_END_TAG (elementContent)*
XML_END_TAG_START QNAME (S)? XML_TAG_END ;
where:
XML_TAG_START: '<' ;
XML_EMPTY_ELEMENT_END: "/>" ;
XML_END_TAG: '>' ;
XML_END_TAG_START: "</" ;
with sufficient look-ahead, this doesn't work properly. When parsing:
<a></a>, <b/>
^______________here
it needlessly looks-ahead at the ',' even though looking at the '>' alone is sufficient. The trouble is that the ',' is parsed in the wrong lexer (it hasn't been popped yet).
If, however, I move the popLexer() action code out of the
elementConstructor rule and (redundantly) into the xmlEmptyElement and
xmlNonEmptyElement rules as in:
xmlEmptyElement
: XML_EMPTY_ELEMENT_END { popLexer(); } ;
xmlNonEmptyElement
: XML_END_TAG (elementContent)*
XML_END_TAG_START QNAME (S)? XML_TAG_END { popLexer(); } ;
then everything works fine: it no longer looks ahead past the '>' at
the ',', the lexer is popped, and the previous lexer returns the comma
correctly.
Why should moving the action code make a difference? This is really
annoying and took dumb-luck and lot of trial-and-error to figure out
how to work around this.
- Paul
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list