[antlr-interest] XQuery grammar/parser open source
Martin Probst
mail at martin-probst.com
Sun May 18 12:07:01 PDT 2008
Hi all,
I've released my XQuery parser as open source. It's available here:
http://code.google.com/p/xqpretty/
... and also linked from the ANTLR grammars page. It's a fairly
complete grammar - it parses all of the >10,000 official test cases
and some really nasty parts. Apart from parsing XQuery (and being a
pretty printer), it might be interesting to other ANTLRers for it's
tactique in parsing the syntactically different parts of the language.
The idea is that I have one single grammar file for the whole
language, but different lexers for the regular XQuery part and the
embedded XML literals. Because these can be nested, the parser pushes
and pops lexers onto an internal stack. This also required
implementing a lazy token stream. The result is a bit fragile with
respect to parser lookahead, but as long as the different lexical
sections are entered/left with a single, non-ambiguous token,
everything is fine.
For example, this is the XML part of the grammar:
dirElemConstructor
: OPEN_ANGLE { pushXMLLexer(); } qNameOrIdent dirAttributeList
(EMPTY_CLOSE_TAG | (CLOSE_ANGLE dirElemContent CLOSE_TAG
qNameOrIdent S? CLOSE_ANGLE))
{ popXMLLexer(); } /* ws: explicit */ ;
dirElemContent leads to this rule:
elemEnclosedExpr
: LCURLY { pushXQueryLexer(); } expr RCURLY { popXQueryLexer(); };
This might also be useful for other languages that have recursively
nested statements in different syntaxes/languages.
Best Regards,
Martin
More information about the antlr-interest
mailing list