[antlr-interest] Lexing XQuery in antlr 3

Thu Sep 24 09:12:52 PDT 2009

Hi,

I am trying to migrate our XQuery lexer from antlr 2 to antlr 3.

The match for a given token depends on the text before or after it.  For
example, in a certain state the string "declare" should be a keyword token
if it is followed by "namespace" and otherwise it should be a QName token.
We successfully handled this in antlr 2 using syntactic predicates.  Eg:

Keywords :
      ('ancestor-or-self' (C|S1)* '::')=> 'ancestor-or-self'
{
        $type = ANCESTOR_OR_SELF_AXIS;
      }
    | ...
    | ('declare' (C|S1)+ 'namespace' Del)=> 'declare' {
        $type = DECLARE;
      }
    |
    ....
    | QName {
         $type = QNAME;
    }
    ;

Note: C and S1 are fragments that match comments and whitespace
respectively.  We also intermix some gated semantic predicates to disable
certain keywords depending on the state of the lexer (a state that we
maintain).  I have omitted that code for brevity.

The rule is pretty long as there are many keywords in XQuery.
Unfortunately, in antlr3 the method specialStateTransition associated with
this Keywords rule exceeds the 64K limit and I get the Java "code too large"
error.  I have looked at composite grammars and searched many of the "code
too large" postings.  Is there a way to break up this kind of rule?

Thanks,
Josh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090924/414afb17/attachment.html