[antlr-interest] Lexing XQuery in antlr 3
Jim Idle
jimi at temporal-wave.com
Thu Sep 24 09:49:59 PDT 2009
On 09/24/2009 09:12 AM, Josh Spiegel wrote:
> Hi,
>
> I am trying to migrate our XQuery lexer from antlr 2 to antlr 3.
>
> The match for a given token depends on the text before or after it.
> For example, in a certain state the string "declare" should be a
> keyword token if it is followed by "namespace" and otherwise it should
> be a QName token. We successfully handled this in antlr 2 using
> syntactic predicates. Eg:
>
> Keywords :
> ('ancestor-or-self' (C|S1)* '::')=> 'ancestor-or-self' {
> $type = ANCESTOR_OR_SELF_AXIS;
> }
> | ...
> | ('declare' (C|S1)+ 'namespace' Del)=> 'declare' {
> $type = DECLARE;
> }
> |
> ....
> | QName {
> $type = QNAME;
> }
> ;
>
>
> Note: C and S1 are fragments that match comments and whitespace
> respectively. We also intermix some gated semantic predicates to
> disable certain keywords depending on the state of the lexer (a state
> that we maintain). I have omitted that code for brevity.
>
> The rule is pretty long as there are many keywords in XQuery.
> Unfortunately, in antlr3 the method specialStateTransition associated
> with this Keywords rule exceeds the 64K limit and I get the Java "code
> too large" error. I have looked at composite grammars and searched
> many of the "code too large" postings. Is there a way to break up
> this kind of rule?
Move this in to the parser rather than the lexer and create an id rule
that allows the keywords as Ids.
However, I think that the complexity of your grammar would be greatly
reduced if you placed the logic in the id rule and checked yourself in a
hash table/map:
QName: ('a'..'b'|'_')+ { $type = lookup($text); } ;
Then create a map of the keywords and in your lookup, manually look
through LA() for the indicators that this is a keyword or not.
XQuery is yet another language designed by people that don't understand
languages I am afraid. Unless there is a real reason this must be done
in the lexer, then transfer it to the parser and I think you will have
better luck.
Jim
More information about the antlr-interest
mailing list