[antlr-interest] Sparql Grammar & Huge C Files

Sat Aug 20 10:03:53 PDT 2011

I followed your instructions and successfully compiled the lexer to a static library. The file size of the library is 82M, which is still quite large for my needs. I will try to rewrite/simplify the grammar.

Thank you very much for your support!

Todor

On Aug 20, 2011, at 6:13 PM, Jim Idle wrote:

> The lexer rules:
> 
> BLANK_NODE_LABEL : '_:' t=PN_LOCAL { setText($t.text); };
> 
> VAR1 : QUESTION_MARK v=VARNAME { setText($v.text); };
> 
> VAR2 : '$' v=VARNAME { setText($v.text); }
> 
> 
> Are coded for Java and not C, you cannot simply change the target language
> when there is embedded Java code.
> 
> All the lexer rules are specified as ('E'|'e' etc, which will generate
> bigger tables than the other ways to implement case insensitivity as
> explained on the wiki. Also, it has a lot  of rules that it has just left
> ANTLR to sort out, which is fair enough, but it is much better to left
> factor the rules and change the $type once you know what the token is. For
> instance all the numeric rules.
> 
> The parser grammar will just work, but it is just naturally a big one. You
> might contact the authors about it. There are probably a lot of ways it
> could be made more efficient, but as the tables are all static, then it
> does not matter that much in C. Look at the size of the data segment once
> it is compiled as this is a better indicator than the size of the source
> code, which has lots of annotations.
> 
> Finally look at the code that it is output, find the decisions that are
> generating large decision trees and look at the corresponding rules for
> any optimizations. However fix up the SETTEXT and it will just work.
> 
> To fix the SETTEXT I would just not do what they are doing but merely
> advance the start pointer in the token by 1 or 2 when/if you use it (or
> within the lexer code if you must). That is trivial and better
> performance. In otherwords just take the setText() actions out altogether.
> 
> Don't forget to use antlr.markmail.org
> 
> 
> Jim
> 
> 
>> -----Original Message-----
>> From: Todor Dimitrov [mailto:todor.dimitrov at stud.uni-due.de]
>> Sent: Saturday, August 20, 2011 8:53 AM
>> To: Jim Idle
>> Subject: Re: [antlr-interest] Sparql Grammar & Huge C Files
>> 
>> Hi Jim,
>> 
>> this is an open source grammar for the Sparql language that has not
>> been developed by me. I have run the ANTLR tool like this:
>> 
>> java -Xms1024m -Xmx1024m -cp antlr-3.4-complete.jar org.antlr.Tool
>> Sparql.g
>> 
>> No warnings have been outputted and looking at the ANTLR tool options,
>> I don't see any switches that would enable/disable warnings generation.
>> I'm not using the SETTEXT macro and I'm not quite sure where to use it.
>> Are there any examples for it? In addition, the Sparql grammar contains
>> only rewriting rules so I'm not sure whether I have to use the SETTEXT
>> macro. I've attached the grammar file for reference.
>> 
>> Todor
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>