[antlr-interest] Solution for specialStateTransition exceeding 65k

Jim Idle jimi at temporal-wave.com
Thu May 27 08:00:48 PDT 2010


You could look at the JavaFX lexer. JavaFX allows expressions in strings in a similar manner but I did not need to use so many predicates. It would probably help you. Find the JavaFX project on Kenai and you can download the source code. Just serach for *.g and you will find the lexer.

Jim



> -----Original Message-----
> From: Marcus Klimstra [mailto:mgb.klimstra at gmail.com]
> Sent: Thursday, May 27, 2010 7:58 AM
> To: Jim Idle
> Subject: Re: [antlr-interest] Solution for specialStateTransition
> exceeding 65k
> 
> Hi Jim,
> 
> Basically the language has string literals which can contain
> 'placeholders'; expressions surrounded by angle brackets:
> 
> stringLiteral
>     :    SQUOTE! stringPart* SQUOTE!
>     ;
> 
> stringPart
>     :    STRCONT
>     |    LT! expr XGT!
>     ;
> 
> expr can also be a string, so 'foo <bar('baz')> quux' would be a valid
> expression. The only exception is that '> is not allowed within
> placeholders.
> 
> The lexer handles this with a stack of 'modes'. All operators and
> keywords have a predicate that the current mode must be 'normal' (i.e.
> outside a string or in a placeholder). When inside a placeholder the
> '>' character yields a XGT token instead of the normal GT, to prevent
> it from being cobbled up by a relational expression.
> 
> PLUS         :    {inNormal}?=>    '+'        ; MINUS        :
> {inNormal}?=>    '-'        ; MUL          :    {inNormal}?=>
> '*'        ; DIV          :    {inNormal}?=>    '/'        ;
> MOD          :    {inNormal}?=>    '%'        ; //etc NOT          :
> {inNormal}?=>    'not'      ; OR           :    {inNormal}?=>
> 'or'       ; AND          :    {inNormal}?=>    'and'      ;
> TRUE         :    {inNormal}?=>    'true'     ; FALSE        :
> {inNormal}?=>    'false'    ; //etc
> 
> SQUOTE
>     :    {inNormal}?=>        '\''    { pushMode(MODE_STRING); }
>     |    {inString}?=>        '\''    { popMode(); }
>     ;
> 
> XGT :    {inPlaceholder}?=>   '>'     { popMode(); }
>     ;
> 
> GT  :    {inNormal}?=>        '>'
>     ;
> 
> LT  :                         '<'     { if (inString) {
> pushMode(MODE_NORMAL); } }
>     ;
> 
> STRCONT
>     :    {inString}?=>        ('a'..'z'|'A'..'Z'|'0'..'9'|' '|'_')+
>     ;
> 
> As you can see, at the moment strings can only contain /[a..z][0..9]
> _/i, since using (~('\''|'<'))+ results in an OutOfMemoryError...
> 
> inNormal, inString and inPlaceholder are booleans which are updated by
> pushMode and popMode:
> 
> private void updateMode() {
>     Integer mode    = stack.peekFirst();
>     inNormal        = (stack.isEmpty() || mode == MODE_NORMAL);
>     inString        = (mode == MODE_STRING);
>     inPlaceholder   = (mode == MODE_NORMAL); }
> 
> Although my current approach seems to work pretty well, I am ofcourse
> open for suggestions. I can't really wait for ANTLR v4 however :)
> 
> Thanks,
> 
> - Marcus
> 
> 
> On Thu, May 27, 2010 at 3:50 PM, Jim Idle <jimi at temporal-wave.com>
> wrote:
> >
> > There is  quite often a way to rejig the lexer to avoid the huge
> > expansion, if you post your grammar, maybe we can help. I think that
> > such issues will go away in v4 :-)
> >
> > Jim
> >
> > > -----Original Message-----
> > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > > bounces at antlr.org] On Behalf Of Marcus Klimstra
> > > Sent: Thursday, May 27, 2010 2:19 AM
> > > To: antlr-interest at antlr.org
> > > Subject: [antlr-interest] Solution for specialStateTransition
> > > exceeding 65k
> > >
> > > Hi,
> > >
> > > I ran into the problem of the huge specialStateTransition bytecode
> > > size when using many gated semantic predicates in the lexer (in all
> > > my lexer rules actually).  After a google search I found that this
> > > is a known issue to which there are some workarounds, but no real
> > > solutions. At first I used the workaround to manually add local
> > > variables for the outer-class references, but at some point even
> that no longer worked.
> > > Therefore I changed the Java code generator to create seperate
> > > methods for each switch-case. This works quite well for me, so I
> > > wanted to share it with the community. Note that I only tested this
> > > in the lexer, since my parser has no specialStateTransition-method
> > > at the moment. I also added annotations to suppress the useless
> > > warnings in the generated code. A diff-file with these changes is
> attached.
> > >
> > > - Marcus
> >
> >
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address





More information about the antlr-interest mailing list