[antlr-interest] Solution for specialStateTransition exceeding 65k
Jim Idle
jimi at temporal-wave.com
Thu May 27 08:00:48 PDT 2010
You could look at the JavaFX lexer. JavaFX allows expressions in strings in a similar manner but I did not need to use so many predicates. It would probably help you. Find the JavaFX project on Kenai and you can download the source code. Just serach for *.g and you will find the lexer.
Jim
> -----Original Message-----
> From: Marcus Klimstra [mailto:mgb.klimstra at gmail.com]
> Sent: Thursday, May 27, 2010 7:58 AM
> To: Jim Idle
> Subject: Re: [antlr-interest] Solution for specialStateTransition
> exceeding 65k
>
> Hi Jim,
>
> Basically the language has string literals which can contain
> 'placeholders'; expressions surrounded by angle brackets:
>
> stringLiteral
> : SQUOTE! stringPart* SQUOTE!
> ;
>
> stringPart
> : STRCONT
> | LT! expr XGT!
> ;
>
> expr can also be a string, so 'foo <bar('baz')> quux' would be a valid
> expression. The only exception is that '> is not allowed within
> placeholders.
>
> The lexer handles this with a stack of 'modes'. All operators and
> keywords have a predicate that the current mode must be 'normal' (i.e.
> outside a string or in a placeholder). When inside a placeholder the
> '>' character yields a XGT token instead of the normal GT, to prevent
> it from being cobbled up by a relational expression.
>
> PLUS : {inNormal}?=> '+' ; MINUS :
> {inNormal}?=> '-' ; MUL : {inNormal}?=>
> '*' ; DIV : {inNormal}?=> '/' ;
> MOD : {inNormal}?=> '%' ; //etc NOT :
> {inNormal}?=> 'not' ; OR : {inNormal}?=>
> 'or' ; AND : {inNormal}?=> 'and' ;
> TRUE : {inNormal}?=> 'true' ; FALSE :
> {inNormal}?=> 'false' ; //etc
>
> SQUOTE
> : {inNormal}?=> '\'' { pushMode(MODE_STRING); }
> | {inString}?=> '\'' { popMode(); }
> ;
>
> XGT : {inPlaceholder}?=> '>' { popMode(); }
> ;
>
> GT : {inNormal}?=> '>'
> ;
>
> LT : '<' { if (inString) {
> pushMode(MODE_NORMAL); } }
> ;
>
> STRCONT
> : {inString}?=> ('a'..'z'|'A'..'Z'|'0'..'9'|' '|'_')+
> ;
>
> As you can see, at the moment strings can only contain /[a..z][0..9]
> _/i, since using (~('\''|'<'))+ results in an OutOfMemoryError...
>
> inNormal, inString and inPlaceholder are booleans which are updated by
> pushMode and popMode:
>
> private void updateMode() {
> Integer mode = stack.peekFirst();
> inNormal = (stack.isEmpty() || mode == MODE_NORMAL);
> inString = (mode == MODE_STRING);
> inPlaceholder = (mode == MODE_NORMAL); }
>
> Although my current approach seems to work pretty well, I am ofcourse
> open for suggestions. I can't really wait for ANTLR v4 however :)
>
> Thanks,
>
> - Marcus
>
>
> On Thu, May 27, 2010 at 3:50 PM, Jim Idle <jimi at temporal-wave.com>
> wrote:
> >
> > There is quite often a way to rejig the lexer to avoid the huge
> > expansion, if you post your grammar, maybe we can help. I think that
> > such issues will go away in v4 :-)
> >
> > Jim
> >
> > > -----Original Message-----
> > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > > bounces at antlr.org] On Behalf Of Marcus Klimstra
> > > Sent: Thursday, May 27, 2010 2:19 AM
> > > To: antlr-interest at antlr.org
> > > Subject: [antlr-interest] Solution for specialStateTransition
> > > exceeding 65k
> > >
> > > Hi,
> > >
> > > I ran into the problem of the huge specialStateTransition bytecode
> > > size when using many gated semantic predicates in the lexer (in all
> > > my lexer rules actually). After a google search I found that this
> > > is a known issue to which there are some workarounds, but no real
> > > solutions. At first I used the workaround to manually add local
> > > variables for the outer-class references, but at some point even
> that no longer worked.
> > > Therefore I changed the Java code generator to create seperate
> > > methods for each switch-case. This works quite well for me, so I
> > > wanted to share it with the community. Note that I only tested this
> > > in the lexer, since my parser has no specialStateTransition-method
> > > at the moment. I also added annotations to suppress the useless
> > > warnings in the generated code. A diff-file with these changes is
> attached.
> > >
> > > - Marcus
> >
> >
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
More information about the antlr-interest
mailing list