[antlr-interest] Sparql Grammar & Huge C Files

Jim Idle jimi at temporal-wave.com
Sat Aug 20 08:36:37 PDT 2011


The huge file size occurs because your lexer/parser is probably trying to
do too much or asking ANTLR to do lots of disambiguation and the complex
overlaps are generating huge tables. In the case of the parser, I suspect
that you need some single token predicates to help with keyword
disambiguation; have you removed ALL the warnings that ANTLR generates on
your grammar? If you do not remove all the warnings then this sort of
thing happens a lot. Especially on a terrible language such as SQL has
morphed in to.

The code only LOOKS small in Java because the generated java uses run
length encoded strings for the table values that it must expand at runtime
- the C target lays down the exact same tables, but in static so that it
is set up at compile time. Java is unable to use compile time initialized
tables like this until JDK 1.7, so the Java target must jump through hoops
to generate the tables. So in fact generating the C is a better indicator
of how efficient your grammar is. You can probably trace the table sizes
down to a few key decisions.

Your set text errors are likely that you are not using the SETTEXT macro
correctly in some way. Also, I would avoid doing that at lex time and do
any manipulation if you actually use the token in question. I can't help
unless I see the lexer code in question though.

Use the 3.4 beta C runtime - there is no difference in the release version
except for the API documentation that I keep trying to finish but my boat
keeps winking at me and making me go on the river.


Jim



> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Todor Dimitrov
> Sent: Saturday, August 20, 2011 7:39 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Sparql Grammar & Huge C Files
>
> Dear *,
>
> generating the C lexer and parser for the Sparql grammar using the
> options below produces huge files:
>
> options {
> 	language = C;
> 	output = AST;
> 	ASTLabelType = pANTLR3_BASE_TREE;
> }
>
> 2.4K Sparql.tokens
> 85M SparqlLexer.c <---
> 30K SparqlLexer.h
> 1.5M SparqlParser.c <---
> 69K SparqlParser.h
>
> In addition, the files cannot be compiled as it seems that the
> generators have not been updated to reflect the API changes in the
> latest C runtime (or maybe it is the other way round :)). In
> particular, I see errors like these:
>
> SparqlLexer.c:1214276:48: error: member reference type 'pANTLR3_STRING'
> (aka 'struct ANTLR3_STRING_struct *') is a
>       pointer; maybe you meant to use '->'?
>                      setText(LEXER->getText(LEXER).substring(1, LEXER-
> >getText(LEXER).length()-1));
>                              ~~~~~~~~~~~~~~~~~~~~~^
>                                                   ->
> SparqlLexer.c:1214276:49: error: no member named 'substring' in 'struct
> ANTLR3_STRING_struct'; did you mean 'subString'?
>                      setText(LEXER->getText(LEXER).substring(1, LEXER-
> >getText(LEXER).length()-1));
>                                                    ^~~~~~~~~
>                                                    subString
> ./antlr3string.h:179:8: note: 'subString' declared here
>                                         (*subString)    (struct
> ANTLR3_STRING_struct * string, ANTLR3_UINT32 ...
>                                           ^
> SparqlLexer.c:1214276:83: error: member reference type 'pANTLR3_STRING'
> (aka 'struct ANTLR3_STRING_struct *') is a
>       pointer; maybe you meant to use '->'?
>                      setText(LEXER->getText(LEXER).substring(1, LEXER-
> >getText(LEXER).length()-1));
>
> ~~~~~~~~~~~~~~~~~~~~~^
>
> ->
> SparqlLexer.c:1214276:84: error: no member named 'length' in 'struct
> ANTLR3_STRING_struct'
>                      setText(LEXER->getText(LEXER).substring(1, LEXER-
> >getText(LEXER).length()-1));
>
>
> I'm using antlr 3.4, but I have also tested this with antlr 3.3.
> Generating the Java lexer and parser works as expected and the files
> are much smaller:
>
> 2.4K Sparql.tokens
> 582K SparqlLexer.java
> 876K SparqlParser.java
>
> Any suggestions and help are highly appreciated.
>
> Thanks in advance,
>
> Todor
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list