[antlr-interest] Sparql Grammar & Huge C Files

Jim Idle jimi at temporal-wave.com
Sat Aug 20 09:13:42 PDT 2011


The lexer rules:

BLANK_NODE_LABEL : '_:' t=PN_LOCAL { setText($t.text); };

VAR1 : QUESTION_MARK v=VARNAME { setText($v.text); };

VAR2 : '$' v=VARNAME { setText($v.text); }


Are coded for Java and not C, you cannot simply change the target language
when there is embedded Java code.

All the lexer rules are specified as ('E'|'e' etc, which will generate
bigger tables than the other ways to implement case insensitivity as
explained on the wiki. Also, it has a lot  of rules that it has just left
ANTLR to sort out, which is fair enough, but it is much better to left
factor the rules and change the $type once you know what the token is. For
instance all the numeric rules.

The parser grammar will just work, but it is just naturally a big one. You
might contact the authors about it. There are probably a lot of ways it
could be made more efficient, but as the tables are all static, then it
does not matter that much in C. Look at the size of the data segment once
it is compiled as this is a better indicator than the size of the source
code, which has lots of annotations.

Finally look at the code that it is output, find the decisions that are
generating large decision trees and look at the corresponding rules for
any optimizations. However fix up the SETTEXT and it will just work.

To fix the SETTEXT I would just not do what they are doing but merely
advance the start pointer in the token by 1 or 2 when/if you use it (or
within the lexer code if you must). That is trivial and better
performance. In otherwords just take the setText() actions out altogether.

Don't forget to use antlr.markmail.org


Jim


> -----Original Message-----
> From: Todor Dimitrov [mailto:todor.dimitrov at stud.uni-due.de]
> Sent: Saturday, August 20, 2011 8:53 AM
> To: Jim Idle
> Subject: Re: [antlr-interest] Sparql Grammar & Huge C Files
>
> Hi Jim,
>
> this is an open source grammar for the Sparql language that has not
> been developed by me. I have run the ANTLR tool like this:
>
> java -Xms1024m -Xmx1024m -cp antlr-3.4-complete.jar org.antlr.Tool
> Sparql.g
>
> No warnings have been outputted and looking at the ANTLR tool options,
> I don't see any switches that would enable/disable warnings generation.
> I'm not using the SETTEXT macro and I'm not quite sure where to use it.
> Are there any examples for it? In addition, the Sparql grammar contains
> only rewriting rules so I'm not sure whether I have to use the SETTEXT
> macro. I've attached the grammar file for reference.
>
> Todor


More information about the antlr-interest mailing list