[antlr-interest] Question about lexer/parser boundaries
Phil Oliver
antlr at olivercomputing.com
Mon Jun 4 13:47:50 PDT 2007
Jim - thanks for the quick response. I would note a couple of things:
first, "merging" the tokens at the lexer stage seems to be an
effective and indeed necessary technique to accomodate the "grouping"
notation in the XQuery 1.0 grammar. i.e. in some parser rule, there
might be a reference to (to use my prior example for continuity):
... < TOKEN1 TOKEN2 > ...
in the XQuery grammar, denoting that TOKEN1 and TOKEN2 are to be
effectively treated as one unit. I think this is done in order to
preserve the grammar as LL(1) parsable. ANTLR itself doesn't (unless
I'm missing it) have such an ability (and sub-rules grouping in
parentheses are not equivalent apparently), other than to define
another lexer rule as my example gave:
MULTIPLE: TOKEN1 TOKEN2;
and then up in the parser rules, < TOKEN TOKEN2 > can be replaced
with MULTIPLE. This appears to work as expected. (Concrete examples
are 'DECLARE boundary-space' vs. 'DECLARE default' vs. 'DECLARE
namespace' etc. - unless you lex each one as single units, the parser
needs LL(2) to distinguish between them. Correct me if I'm wrong
here. Yes, I understand that ANTLR 3.0 is LL(*) and can backtrack but
I want to keep this LL(1), as intended by the official grammar.)
I'm actually more concerned about my first examples with the
character ranges, than the "merging" idea, though for completeness I
wanted to include it in my question.
More information about the antlr-interest
mailing list