[antlr-interest] Antlr 3.2 vs. Bison 2.4.2+Flex 2.5.35 Speed/Memory

Bob temporaryemail at comcast.net
Sun May 23 17:28:55 PDT 2010


I have gotten rid of the string stuff and memory reduced from 1.7g to
830meg. 830Meg is still a lot bigger than the
11Meg input file. Do we revisit tokenizing of the entire input file to
explain this?
	
Here is a snapshot of the program stack while it's expanding memory. Hope it
helps you!
  
	TestExpr.exe!newPool(ANTLR3_TOKEN_FACTORY_struct *
factory=0x00825a40)  Line 167 + 0xd bytes	C
 	TestExpr.exe!newPoolToken(ANTLR3_TOKEN_FACTORY_struct *
factory=0x00825a40)  Line 190 + 0x9 bytes	C
 	TestExpr.exe!emit(ANTLR3_LEXER_struct * lexer=0x008255d8)  Line 631
+ 0x23 bytes	C
 	TestExpr.exe!nextTokenStr(ANTLR3_TOKEN_SOURCE_struct *
toksource=0x008258d0)  Line 272 + 0x9 bytes	C
 	TestExpr.exe!nextToken(ANTLR3_TOKEN_SOURCE_struct *
toksource=0x008258d0)  Line 316 + 0x9 bytes	C
 	TestExpr.exe!fillBuffer(ANTLR3_COMMON_TOKEN_STREAM_struct *
tokenStream=0x00bd40a8)  Line 964 + 0x15 bytes	C
 	TestExpr.exe!tokLT(ANTLR3_TOKEN_STREAM_struct * ts=0x00bd4128, int
k=1)  Line 319 + 0x9 bytes	C
 	TestExpr.exe!getCurrentInputSymbol(ANTLR3_BASE_RECOGNIZER_struct *
recognizer=0x00bd4408, ANTLR3_INT_STREAM_struct * istream=0x00bd41a0)  Line
2158 + 0x16 bytes	C
 	TestExpr.exe!match(ANTLR3_BASE_RECOGNIZER_struct *
recognizer=0x00bd4408, unsigned int ttype=17, ANTLR3_BITSET_LIST_struct *
follow=0x0041c210)  Line 451 + 0x12 bytes	C
 	TestExpr.exe!module_ansi_header(NuVParser_Ctx_struct *
ctx=0x00bd4310)  Line 893 + 0x1f bytes	C
 	TestExpr.exe!module_declaration(NuVParser_Ctx_struct *
ctx=0x00bd4310)  Line 800 + 0x9 bytes	C
 	TestExpr.exe!description(NuVParser_Ctx_struct * ctx=0x00bd4310)
Line 754 + 0x9 bytes	C
 	TestExpr.exe!source_text(NuVParser_Ctx_struct * ctx=0x00bd4310)
Line 667 + 0x9 bytes	C
 	TestExpr.exe!main(int argc=3, char * * argv=0x00824f78)  Line 35 +
0xe bytes	C++

This is the recommended change to get rid of the strings:

identifier returns [void* node] : SIMPLE_IDENTIFIER
        {
            pANTLR3_COMMON_TOKEN _pToken = $SIMPLE_IDENTIFIER;
            pANTLR3_UINT8 _f=(pANTLR3_UINT8)_pToken->getStartIndex(_pToken);
            pANTLR3_UINT8 _l=(pANTLR3_UINT8)_pToken->getStopIndex(_pToken);
            unsigned int  _size = _l-_f+1;
            char*         _p = (char*)alloca(_size+1);
            memcpy(_p,_f,_size);
            _p[_size] = 0;
            
            $node = act_identifier( _p );
        }
    ;
-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle
Sent: Saturday, May 22, 2010 12:16 PM
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Antlr 3.2 vs. Bison 2.4.2+Flex 2.5.35
Speed/Memory

The string stuff is just a convenience method for simple stuff. For real
programs you should not do that but use the pointer to start and end that is
contained in the token. Then you can just memcpy straight from the input
source into whatever structure you are using.

Also, if your input is 8 bit then follow the examples and use direct
pointers. If you don't need error recovery then in the next release you can
turn off the stack overhead, or you can do it yourself by undefining the
macrothat does this in the generated code.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Bob
> Sent: Saturday, May 22, 2010 11:33 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Antlr 3.2 vs. Bison 2.4.2+Flex 2.5.35
> Speed/Memory
> 
> Perhaps you are correct regarding the strings. Looking at the antlr
> generated "identifier" rule code, I see
> 
> "SIMPLE_IDENTIFIER12" is not freed.  Is there a mechanism whereby I can
> free
> the string when the rule is complete?
> 
> 
> 
> After reading the file into memory, the process size is ~11meg (size of
> input file), then it rises to 853Meg before the 1st "module" ..
> "endmodule"
> from the input is recognized. As subsequent "module".."endmodule"'s are
> recognized memory rises to 1.2+Gb.
> 
> 
> 
> If you have a solution to free the strings when a rule is complete,
> I'll try
> that and see where it stands. The expansion from 11Meg to over 1.2Gb is
> quite a bit for strings!
> 
> 
> 
> 
> 
> Rule:
> 
> 
> 
> identifier returns [void* node] : SIMPLE_IDENTIFIER
> 
>         { $node = act_identifier( $SIMPLE_IDENTIFIER.text->chars ); }
> 
>     ;
> 
> 
> 
> Code:
> 
> 
> 
> static void*
> 
> identifier(pNuVParser ctx)
> 
> {
> 
>     void* node = NULL;
> 
> 
> 
>     pANTLR3_COMMON_TOKEN    SIMPLE_IDENTIFIER12;
> 
> 
> 
>     /* Initialize rule variables
> 
>      */
> 
> 
> 
> 
> 
>     SIMPLE_IDENTIFIER12       = NULL;
> 
> 
> 
>     {
> 
>         // NuV.g:102:33: ( SIMPLE_IDENTIFIER )
> 
>         // NuV.g:102:35: SIMPLE_IDENTIFIER
> 
>         {
> 
>             SIMPLE_IDENTIFIER12 = (pANTLR3_COMMON_TOKEN)
> MATCHT(SIMPLE_IDENTIFIER, &FOLLOW_SIMPLE_IDENTIFIER_in_identifier608);
> 
>             if  (HASEXCEPTION())
> 
>             {
> 
>                 goto ruleidentifierEx;
> 
>             }
> 
> 
> 
>             {
> 
>                  node= act_identifier(
> (SIMPLE_IDENTIFIER12->getText(SIMPLE_IDENTIFIER12))->chars );
> 
>             }
> 
> 
> 
>         }
> 
> 
> 
>     }
> 
> 
> 
> 
> 
>     // This is where rules clean up and exit
> 
>     //
> 
>     goto ruleidentifierEx; /* Prevent compiler warnings */
> 
>     ruleidentifierEx: ;
> 
> 
> 
>             if (HASEXCEPTION())
> 
>             {
> 
>                 PREPORTERROR();
> 
>                 PRECOVER();
> 
>             }
> 
> 
> 
> 
> 
>     return node;
> 
> }
> 
> 
> 
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle
> Sent: Saturday, May 22, 2010 10:37 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Antlr 3.2 vs. Bison 2.4.2+Flex 2.5.35
> Speed/Memory
> 
> 
> 
> The other thing to add here is that you are using the $xxx.text
> references,
> and these do not free up the string memory until you free the parser.
> With
> this many inputs, the memory usage you are seeing is probably this
> first and
> not the tokens.
> 
> 
> 
> Jim
> 
> 
> 
> > -----Original Message-----
> 
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> 
> > bounces at antlr.org] On Behalf Of Bob
> 
> > Sent: Friday, May 21, 2010 7:47 PM
> 
> > To: antlr-interest at antlr.org
> 
> > Subject: [antlr-interest] Antlr 3.2 vs. Bison 2.4.2+Flex 2.5.35
> 
> > Speed/Memory
> 
> >
> 
> > A tiny grammar was implemented in both Antlr and Bison+Flex (shown
> 
> > below).
> 
> >
> 
> > Test files repeating two lines (shown below) were made in 6 different
> 
> >
> 
> > sizes.
> 
> >
> 
> > One executable compiled with command line switch choosing either
> 
> >
> 
> > Antlr or Bison+Flex.
> 
> >
> 
> > One run with empty actions, one run with actions populated, to
> compare
> 
> >
> 
> > pure parsing with some actual work.
> 
> >
> 
> >
> 
> >
> 
> > Results:
> 
> >
> 
> >
> 
> > CPU time    Peak Memory
> 
> >
> 
> > File Name     File Size # modules #tokens  Bison Antlr  Bison Antlr
> 
> >
> 
> > Action bodies empty:
> 
> >
> 
> > source.v10m     460mb      10m      150m         28s          572k  *
> 
> >
> 
> > source.v5m       230mb       5m        75m           15s
> 572k
> 
> > *
> 
> >
> 
> > source.v2.5m    115mb       2.5m     37m           7s           572k
> *
> 
> >
> 
> > source.v1m       46mb         1m        15m            2s
> 
> > 572k  *
> 
> >
> 
> > source.v500k    23mb        500k      7.5m          1s
> 572k
> 
> > *
> 
> >
> 
> > source.v250k    11mb        250k      3.7m        <1s   4s     572k
> 
> > 1.7g
> 
> > <-----------
> 
> >
> 
> > Action bodies populated:
> 
> >
> 
> > source.v250k    11mb        250k      3.7m         9s   13s    477m
> 
> > 1.7g
> 
> > <-----------
> 
> >
> 
> >
> 
> >
> 
> > * Antlr ran out of memory at 2gb
> 
> >
> 
> >
> 
> >
> 
> > Comments:
> 
> >
> 
> >
> 
> >
> 
> > 1. I expected the requirement that the entire file be resident in
> 
> > memory
> 
> >
> 
> >    to be the memory glut. Surprise! Quick inspection suggests an
> 
> > initial
> 
> >
> 
> >    tokenizing of the entire in-memory file consumes gobbs of memory,
> 
> > pushing
> 
> >
> 
> >    a small footprint up to 1.7gb before releasing it. Only the
> smallest
> 
> >
> 
> >    test file was under the runable 32 bit 2gb limit.     Please fix!!
> 
> >
> 
> >
> 
> >
> 
> > 2. Speed is clearly slower than bison+flex, however empty actions
> don't
> 
> > make
> 
> >
> 
> >    interesting programs. The test with actions enabled shows a 9s vs.
> 
> > 13s
> 
> >
> 
> >    difference, considerable less than the empty action case.
> 
> >
> 
> >
> 
> >
> 
> > 3. If you've never setup bison+flex I have only one comment: !#@%$#.
> 
> > Two
> 
> >
> 
> >    thumbs up for Antlr.
> 
> >
> 
> >
> 
> >
> 
> > Details:
> 
> >
> 
> >
> 
> >
> 
> >   Vista 64, AMD opteron 2.4Ghz, 16gb ram
> 
> >
> 
> >   Visual Studio 2008 Sp1
> 
> >
> 
> >   One exe file with both Antlr and Bison+Flex, targeting 32 bit
> 
> >
> 
> >   Full Optimization (/Ox), Inline Any suitable (/Ob2), Favor Small
> Code
> 
> > (/Os)
> 
> >
> 
> >   Versions:
> 
> >
> 
> >     Antlr 3.2
> 
> >
> 
> >     Bison 2.4.2 LR(1)
> 
> >
> 
> >     Flex  2.5.35
> 
> >
> 
> >
> 
> >
> 
> >
> 
> >
> 
> > ------------------- Input file -----------------------------
> 
> >
> 
> > module tiptop #(int p1=3, p2=4 );
> 
> >
> 
> > endmodule
> 
> >
> 
> > ... repeat to the indicated number of modules ...
> 
> >
> 
> > ------------------- Antlr Grammar --------------------------
> 
> >
> 
> > source_text : description ( description )*
> 
> >
> 
> >     ;
> 
> >
> 
> > description : module_declaration
> 
> >
> 
> >     ;
> 
> >
> 
> > module_declaration : module_ansi_header ENDMODULE ( ':'
> 
> > module_identifier )?
> 
> >
> 
> >         { act_module(); }
> 
> >
> 
> >     ;
> 
> >
> 
> > module_ansi_header : MODULE_KEYWORD module_identifier (
> 
> > parameter_port_list
> 
> > )? ';'
> 
> >
> 
> >     ;
> 
> >
> 
> > module_identifier : identifier
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_port_list
> 
> >
> 
> >     : '#' '(' parameter_port_declaration ( ','
> 
> > parameter_port_declaration )*
> 
> > ')'
> 
> >
> 
> >     | '#' '(' ')'
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_port_declaration returns [void* node]
> 
> >
> 
> > scope {
> 
> >
> 
> >     void* type;
> 
> >
> 
> >     void* head;
> 
> >
> 
> >     void* tail;
> 
> >
> 
> > }
> 
> >
> 
> >     : data_type
> 
> >
> 
> >         { $parameter_port_declaration::type = $data_type.node;
> 
> >
> 
> >             $parameter_port_declaration::head=NULL; }
> 
> > list_of_param_assignments
> 
> >
> 
> >         { $node = $parameter_port_declaration::head; }
> 
> >
> 
> >     ;
> 
> >
> 
> > list_of_param_assignments
> 
> >
> 
> >     : param_assignment ( ',' param_assignment )*
> 
> >
> 
> >     ;
> 
> >
> 
> > param_assignment
> 
> >
> 
> >     : parameter_identifier '=' constant_param_expression
> 
> >
> 
> >         { act_param_assignment
> 
> >
> 
> >             (
> 
> >
> 
> >                 & $parameter_port_declaration::head,
> 
> >
> 
> >                 & $parameter_port_declaration::tail,
> 
> >
> 
> >                 $parameter_identifier.node,
> 
> >
> 
> >                 $parameter_port_declaration::type,
> 
> >
> 
> >                 $constant_param_expression.node
> 
> >
> 
> >             );
> 
> >
> 
> >         }
> 
> >
> 
> >     ;
> 
> >
> 
> > constant_param_expression returns [void* node]
> 
> >
> 
> >     : constant_mintypmax_expression
> 
> >
> 
> >         { $node = $constant_mintypmax_expression.node; }
> 
> >
> 
> > //    | '$'
> 
> >
> 
> >     ;
> 
> >
> 
> > constant_mintypmax_expression returns [void* node]
> 
> >
> 
> >     : constant_expression
> 
> >
> 
> >         { $node = $constant_expression.node; }
> 
> >
> 
> >     ;
> 
> >
> 
> > // Deviate from LRM
> 
> >
> 
> > constant_expression returns [void* node]
> 
> >
> 
> >     : expr { $node = $expr.node; }
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_identifier returns [void* node]
> 
> >
> 
> >     : identifier { $node = $identifier.node; }
> 
> >
> 
> >     ;
> 
> >
> 
> > data_type returns [void* node]
> 
> >
> 
> >     : integer_atom_type signing
> 
> >
> 
> >        {$node=act_type($integer_atom_type.value,$signing.value);}
> 
> >
> 
> >     | integer_atom_type
> 
> >
> 
> >        {$node=act_type($integer_atom_type.value,-1);}
> 
> >
> 
> >     ;
> 
> >
> 
> >
> 
> >
> 
> >
> 
> >
> 
> > expr returns [void* node] : NUMBER
> 
> >
> 
> >         { $node = act_number( $NUMBER.text->chars ); }
> 
> >
> 
> >         ;
> 
> >
> 
> >
> 
> >
> 
> > identifier returns [void* node] : SIMPLE_IDENTIFIER
> 
> >
> 
> >         { $node = act_identifier( $SIMPLE_IDENTIFIER.text->chars ); }
> 
> >
> 
> >     ;
> 
> >
> 
> >
> 
> >
> 
> > /*------------------------------------------------------------------
> 
> >
> 
> >  * LEXER RULES
> 
> >
> 
> >  *------------------------------------------------------------------
> */
> 
> >
> 
> >
> 
> >
> 
> > integer_atom_type returns [int value]
> 
> >
> 
> >     : TokByte       {$value = TokByte;}
> 
> >
> 
> >     | TokShortint   {$value = TokShortint;}
> 
> >
> 
> >     | TokInt        {$value = TokInt;}
> 
> >
> 
> >     | TokLongint    {$value = TokLongint;}
> 
> >
> 
> >     | TokInteger    {$value = TokInteger;}
> 
> >
> 
> >     | TokTime       {$value = TokTime;}
> 
> >
> 
> >     ;
> 
> >
> 
> > signing returns [int value]
> 
> >
> 
> >     : TokSigned     {$value= TokSigned;}
> 
> >
> 
> >     | TokUnsigned   {$value= TokUnsigned;}
> 
> >
> 
> >     ;
> 
> >
> 
> > MODULE_KEYWORD  : (( 'module' )|('macromodule') )
> 
> >
> 
> >     ;
> 
> >
> 
> > ENDMODULE       : 'endmodule'
> 
> >
> 
> >     ;
> 
> >
> 
> > SIMPLE_IDENTIFIER : ( 'a'..'z'|'A'..'Z'|'_' ) (
> 
> > 'a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*
> 
> >
> 
> >     ;
> 
> >
> 
> >
> 
> >
> 
> > NUMBER : (DIGIT)+
> 
> >
> 
> >             ;
> 
> >
> 
> >
> 
> >
> 
> > WHITESPACE  : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+
> 
> >
> 
> >               {
> 
> >
> 
> >                  $channel = HIDDEN;
> 
> >
> 
> >               }
> 
> >
> 
> >             ;
> 
> >
> 
> > fragment
> 
> >
> 
> > DIGIT         : '0'..'9'
> 
> >
> 
> >             ;
> 
> >
> 
> > ------------------- Bison Grammar --------------------------
> 
> >
> 
> > %%
> 
> >
> 
> > source_text : description
> 
> >
> 
> >     ;
> 
> >
> 
> > description
> 
> >
> 
> >                 : module_declaration
> 
> >
> 
> >                 | description module_declaration
> 
> >
> 
> >     ;
> 
> >
> 
> > module_declaration
> 
> >
> 
> >                 : module_ansi_header TokEndmodule
> 
> >
> 
> >                 { act_module(); }
> 
> >
> 
> >                 | module_ansi_header TokEndmodule ':'
> module_identifier
> 
> >
> 
> >                 { act_module(); }
> 
> >
> 
> >     ;
> 
> >
> 
> > module_ansi_header
> 
> >
> 
> >                 : TokModule module_identifier ';'
> 
> >
> 
> >                 | TokModule module_identifier parameter_port_list ';'
> 
> >
> 
> >     ;
> 
> >
> 
> > module_identifier : identifier
> 
> >
> 
> >                 { $$ = $1; }
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_port_list
> 
> >
> 
> >     : '#' '(' parameter_port_list_recur ')'
> 
> >
> 
> >     | '#' '(' ')'
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_port_list_recur
> 
> >
> 
> >                 : parameter_port_declaration
> 
> >
> 
> >                 | parameter_port_list_recur ','
> 
> > parameter_port_declaration
> 
> >
> 
> >                 ;
> 
> >
> 
> > parameter_port_declaration
> 
> >
> 
> >                 : parameter_port_declaration_scope
> 
> >
> 
> >                     data_type { $1.type = $2; $1.head = NULL; }
> 
> >
> 
> >                       list_of_param_assignments { $$ = $1.head; }
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_port_declaration_scope :
> 
> >
> 
> >                 ;
> 
> >
> 
> > list_of_param_assignments
> 
> >
> 
> >     : nil                       nil param_assignment
> 
> >
> 
> > /* FIX:: need LR(2) here */
> 
> >
> 
> >     | list_of_param_assignments ',' param_assignment
> 
> >
> 
> >     ;
> 
> >
> 
> > param_assignment
> 
> >
> 
> >     : parameter_identifier '=' constant_param_expression
> 
> >
> 
> >     { act_param_assignment
> 
> >
> 
> >       (
> 
> >
> 
> >        & $<scope1>-3.head,
> 
> >
> 
> >        & $<scope1>-3.tail,
> 
> >
> 
> >        $1,
> 
> >
> 
> >        $<scope1>-3.type,
> 
> >
> 
> >        $3
> 
> >
> 
> >        );
> 
> >
> 
> >     }
> 
> >
> 
> >     ;
> 
> >
> 
> > constant_param_expression
> 
> >
> 
> >                 : constant_mintypmax_expression { $$ = $1; }
> 
> >
> 
> > //    | '$'
> 
> >
> 
> >     ;
> 
> >
> 
> > constant_mintypmax_expression
> 
> >
> 
> >     : constant_expression { $$ = $1; }
> 
> >
> 
> >     ;
> 
> >
> 
> > // Deviate from LRM
> 
> >
> 
> > constant_expression : expr { $$ = $1; }
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_identifier : identifier
> 
> >
> 
> >     { $$ = $1; }
> 
> >
> 
> >     ;
> 
> >
> 
> > data_type
> 
> >
> 
> >                 : integer_atom_type signing { $$ = act_typeB($1,$2);
> }
> 
> >
> 
> >                 | integer_atom_type         { $$ = act_typeB($1,-1);
> }
> 
> >
> 
> >                 ;
> 
> >
> 
> > expr       : TokNumber
> 
> >
> 
> >                 { $$ = act_number( $1 ); }
> 
> >
> 
> >                 ;
> 
> >
> 
> >
> 
> >
> 
> > nil           : /* empty */
> 
> >
> 
> >     ;
> 
> >
> 
> > identifier : TokIdentifier
> 
> >
> 
> >                 { $$ = act_identifier( $1 ); }
> 
> >
> 
> >     ;
> 
> >
> 
> > integer_atom_type
> 
> >
> 
> >                 : TokByte     { $$ = $1; }
> 
> >
> 
> >                 | TokShortint { $$ = $1; }
> 
> >
> 
> >                 | TokInt      { $$ = $1; }
> 
> >
> 
> >                 | TokLongint  { $$ = $1; }
> 
> >
> 
> >                 | TokInteger  { $$ = $1; }
> 
> >
> 
> >                 | TokTime     { $$ = $1; }
> 
> >
> 
> >                 ;
> 
> >
> 
> > signing  : TokSigned   { $$ = $1; }
> 
> >
> 
> >                 | TokUnsigned { $$ = $1; }
> 
> >
> 
> >                 ;
> 
> >
> 
> > %%
> 
> >
> 
> > ---------------------------------------------------------------
> 
> >
> 
> >
> 
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> 
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-
> 
> > email-address
> 
> 
> 
> 
> 
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> 
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address




List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list