[antlr-interest] XQuery grammar/parser open source

Martin Probst mail at martin-probst.com
Sun May 18 12:07:01 PDT 2008


Hi all,

I've released my XQuery parser as open source. It's available here:

http://code.google.com/p/xqpretty/

... and also linked from the ANTLR grammars page. It's a fairly  
complete grammar - it parses all of the >10,000 official test cases  
and some really nasty parts. Apart from parsing XQuery (and being a  
pretty printer), it might be interesting to other ANTLRers for it's  
tactique in parsing the syntactically different parts of the language.

The idea is that I have one single grammar file for the whole  
language, but different lexers for the regular XQuery part and the  
embedded XML literals. Because these can be nested, the parser pushes  
and pops lexers onto an internal stack. This also required  
implementing a lazy token stream. The result is a bit fragile with  
respect to parser lookahead, but as long as the different lexical  
sections are entered/left with a single, non-ambiguous token,  
everything is fine.

For example, this is the XML part of the grammar:
dirElemConstructor	
		:	OPEN_ANGLE { pushXMLLexer(); } qNameOrIdent dirAttributeList
			(EMPTY_CLOSE_TAG | (CLOSE_ANGLE dirElemContent CLOSE_TAG  
qNameOrIdent S? CLOSE_ANGLE))
			{ popXMLLexer(); }	/* ws: explicit */ ;

dirElemContent leads to this rule:
elemEnclosedExpr
		:	LCURLY { pushXQueryLexer(); } expr RCURLY { popXQueryLexer(); };

This might also be useful for other languages that have recursively  
nested statements in different syntaxes/languages.

Best Regards,
Martin


More information about the antlr-interest mailing list