[antlr-interest] Solution for specialStateTransition exceeding 65k
Marcus Klimstra
mgb.klimstra at gmail.com
Thu May 27 08:06:34 PDT 2010
Hi Jim,
Basically the language has string literals which can contain
'placeholders'; expressions surrounded by angle brackets:
stringLiteral
: SQUOTE! stringPart* SQUOTE!
;
stringPart
: STRCONT
| LT! expr XGT!
;
expr can also be a string, so 'foo <bar('baz')> quux' would be a valid
expression. The only exception is that '> is not allowed within
placeholders.
The lexer handles this with a stack of 'modes'. All operators and
keywords have a predicate that the current mode must be 'normal' (i.e.
outside a string or in a placeholder). When inside a placeholder the
'>' character yields a XGT token instead of the normal GT, to prevent
it from being cobbled up by a relational expression.
PLUS : {inNormal}?=> '+' ;
MINUS : {inNormal}?=> '-' ;
MUL : {inNormal}?=> '*' ;
DIV : {inNormal}?=> '/' ;
MOD : {inNormal}?=> '%' ;
//etc
NOT : {inNormal}?=> 'not' ;
OR : {inNormal}?=> 'or' ;
AND : {inNormal}?=> 'and' ;
TRUE : {inNormal}?=> 'true' ;
FALSE : {inNormal}?=> 'false' ;
//etc
SQUOTE
: {inNormal}?=> '\'' { pushMode(MODE_STRING); }
| {inString}?=> '\'' { popMode(); }
;
XGT : {inPlaceholder}?=> '>' { popMode(); }
;
GT : {inNormal}?=> '>'
;
LT : '<' { if (inString) {
pushMode(MODE_NORMAL); } }
;
STRCONT
: {inString}?=> ('a'..'z'|'A'..'Z'|'0'..'9'|' '|'_')+
;
As you can see, at the moment strings can only contain /[a..z][0..9]
_/i, since using (~('\''|'<'))+ results in an OutOfMemoryError...
inNormal, inString and inPlaceholder are booleans which are updated by
pushMode and popMode:
private void updateMode() {
Integer mode = stack.peekFirst();
inNormal = (stack.isEmpty() || mode == MODE_NORMAL);
inString = (mode == MODE_STRING);
inPlaceholder = (mode == MODE_NORMAL);
}
Although my current approach seems to work pretty well, I am ofcourse
open for suggestions. I can't really wait for ANTLR v4 however :)
Thanks,
- Marcus
On Thu, May 27, 2010 at 3:50 PM, Jim Idle <jimi at temporal-wave.com> wrote:
> There is quite often a way to rejig the lexer to avoid the huge expansion, if you post your grammar, maybe we can help. I think that such issues will go away in v4 :-)
>
> Jim
>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of Marcus Klimstra
>> Sent: Thursday, May 27, 2010 2:19 AM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] Solution for specialStateTransition exceeding
>> 65k
>>
>> Hi,
>>
>> I ran into the problem of the huge specialStateTransition bytecode size
>> when using many gated semantic predicates in the lexer (in all my lexer
>> rules actually). After a google search I found that this is a known
>> issue to which there are some workarounds, but no real solutions. At
>> first I used the workaround to manually add local variables for the
>> outer-class references, but at some point even that no longer worked.
>> Therefore I changed the Java code generator to create seperate methods
>> for each switch-case. This works quite well for me, so I wanted to
>> share it with the community. Note that I only tested this in the lexer,
>> since my parser has no specialStateTransition-method at the moment. I
>> also added annotations to suppress the useless warnings in the
>> generated code. A diff-file with these changes is attached.
>>
>> - Marcus
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
-------------- next part --------------
===========================================================
stringLiteral
: SQUOTE! stringPart* SQUOTE!
;
stringPart
: STRCONT
| LT! expr XGT!
;
===========================================================
PLUS : {inNormal}?=> '+' ;
MINUS : {inNormal}?=> '-' ;
MUL : {inNormal}?=> '*' ;
DIV : {inNormal}?=> '/' ;
MOD : {inNormal}?=> '%' ;
//etc
NOT : {inNormal}?=> 'not' ;
OR : {inNormal}?=> 'or' ;
AND : {inNormal}?=> 'and' ;
TRUE : {inNormal}?=> 'true' ;
FALSE : {inNormal}?=> 'false' ;
//etc
SQUOTE
: {inNormal}?=> '\'' { pushMode(MODE_STRING); }
| {inString}?=> '\'' { popMode(); }
;
XGT : {inPlaceholder}?=> '>' { popMode(); }
;
GT : {inNormal}?=> '>'
;
LT : '<' { if (inString) { pushMode(MODE_NORMAL); } }
;
STRCONT
: {inString}?=> ('a'..'z'|'A'..'Z'|'0'..'9'|' '|'_')+
;
===========================================================
private void updateMode() {
Integer mode = stack.peekFirst();
inNormal = (stack.isEmpty() || mode == MODE_NORMAL);
inString = (mode == MODE_STRING);
inPlaceholder = (mode == MODE_NORMAL);
}
===========================================================
More information about the antlr-interest
mailing list