[antlr-interest] Sparql Grammar & Huge C Files

Sat Aug 20 10:13:25 PDT 2011

The library may be that big, but look at the load size. It may not be as
big as it looks.

Jim

> -----Original Message-----
> From: Todor Dimitrov [mailto:todor.dimitrov at stud.uni-due.de]
> Sent: Saturday, August 20, 2011 10:04 AM
> To: Jim Idle
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Sparql Grammar & Huge C Files
>
> I followed your instructions and successfully compiled the lexer to a
> static library. The file size of the library is 82M, which is still
> quite large for my needs. I will try to rewrite/simplify the grammar.
>
> Thank you very much for your support!
>
> Todor
>
>
>
> On Aug 20, 2011, at 6:13 PM, Jim Idle wrote:
>
> > The lexer rules:
> >
> > BLANK_NODE_LABEL : '_:' t=PN_LOCAL { setText($t.text); };
> >
> > VAR1 : QUESTION_MARK v=VARNAME { setText($v.text); };
> >
> > VAR2 : '$' v=VARNAME { setText($v.text); }
> >
> >
> > Are coded for Java and not C, you cannot simply change the target
> > language when there is embedded Java code.
> >
> > All the lexer rules are specified as ('E'|'e' etc, which will
> generate
> > bigger tables than the other ways to implement case insensitivity as
> > explained on the wiki. Also, it has a lot  of rules that it has just
> > left ANTLR to sort out, which is fair enough, but it is much better
> to
> > left factor the rules and change the $type once you know what the
> > token is. For instance all the numeric rules.
> >
> > The parser grammar will just work, but it is just naturally a big
> one.
> > You might contact the authors about it. There are probably a lot of
> > ways it could be made more efficient, but as the tables are all
> > static, then it does not matter that much in C. Look at the size of
> > the data segment once it is compiled as this is a better indicator
> > than the size of the source code, which has lots of annotations.
> >
> > Finally look at the code that it is output, find the decisions that
> > are generating large decision trees and look at the corresponding
> > rules for any optimizations. However fix up the SETTEXT and it will
> just work.
> >
> > To fix the SETTEXT I would just not do what they are doing but merely
> > advance the start pointer in the token by 1 or 2 when/if you use it
> > (or within the lexer code if you must). That is trivial and better
> > performance. In otherwords just take the setText() actions out
> altogether.
> >
> > Don't forget to use antlr.markmail.org
> >
> >
> > Jim
> >
> >
> >> -----Original Message-----
> >> From: Todor Dimitrov [mailto:todor.dimitrov at stud.uni-due.de]
> >> Sent: Saturday, August 20, 2011 8:53 AM
> >> To: Jim Idle
> >> Subject: Re: [antlr-interest] Sparql Grammar & Huge C Files
> >>
> >> Hi Jim,
> >>
> >> this is an open source grammar for the Sparql language that has not
> >> been developed by me. I have run the ANTLR tool like this:
> >>
> >> java -Xms1024m -Xmx1024m -cp antlr-3.4-complete.jar org.antlr.Tool
> >> Sparql.g
> >>
> >> No warnings have been outputted and looking at the ANTLR tool
> >> options, I don't see any switches that would enable/disable warnings
> generation.
> >> I'm not using the SETTEXT macro and I'm not quite sure where to use
> it.
> >> Are there any examples for it? In addition, the Sparql grammar
> >> contains only rewriting rules so I'm not sure whether I have to use
> >> the SETTEXT macro. I've attached the grammar file for reference.
> >>
> >> Todor
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
> >