[antlr-interest] Antlr 3.2 vs. Bison 2.4.2+Flex 2.5.35 Speed/Memory

Jim Idle jimi at temporal-wave.com
Sat May 22 12:15:47 PDT 2010


The string stuff is just a convenience method for simple stuff. For real programs you should not do that but use the pointer to start and end that is contained in the token. Then you can just memcpy straight from the input source into whatever structure you are using.

Also, if your input is 8 bit then follow the examples and use direct pointers. If you don't need error recovery then in the next release you can turn off the stack overhead, or you can do it yourself by undefining the macrothat does this in the generated code.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Bob
> Sent: Saturday, May 22, 2010 11:33 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Antlr 3.2 vs. Bison 2.4.2+Flex 2.5.35
> Speed/Memory
> 
> Perhaps you are correct regarding the strings. Looking at the antlr
> generated "identifier" rule code, I see
> 
> "SIMPLE_IDENTIFIER12" is not freed.  Is there a mechanism whereby I can
> free
> the string when the rule is complete?
> 
> 
> 
> After reading the file into memory, the process size is ~11meg (size of
> input file), then it rises to 853Meg before the 1st "module" ..
> "endmodule"
> from the input is recognized. As subsequent "module".."endmodule"'s are
> recognized memory rises to 1.2+Gb.
> 
> 
> 
> If you have a solution to free the strings when a rule is complete,
> I'll try
> that and see where it stands. The expansion from 11Meg to over 1.2Gb is
> quite a bit for strings!
> 
> 
> 
> 
> 
> Rule:
> 
> 
> 
> identifier returns [void* node] : SIMPLE_IDENTIFIER
> 
>         { $node = act_identifier( $SIMPLE_IDENTIFIER.text->chars ); }
> 
>     ;
> 
> 
> 
> Code:
> 
> 
> 
> static void*
> 
> identifier(pNuVParser ctx)
> 
> {
> 
>     void* node = NULL;
> 
> 
> 
>     pANTLR3_COMMON_TOKEN    SIMPLE_IDENTIFIER12;
> 
> 
> 
>     /* Initialize rule variables
> 
>      */
> 
> 
> 
> 
> 
>     SIMPLE_IDENTIFIER12       = NULL;
> 
> 
> 
>     {
> 
>         // NuV.g:102:33: ( SIMPLE_IDENTIFIER )
> 
>         // NuV.g:102:35: SIMPLE_IDENTIFIER
> 
>         {
> 
>             SIMPLE_IDENTIFIER12 = (pANTLR3_COMMON_TOKEN)
> MATCHT(SIMPLE_IDENTIFIER, &FOLLOW_SIMPLE_IDENTIFIER_in_identifier608);
> 
>             if  (HASEXCEPTION())
> 
>             {
> 
>                 goto ruleidentifierEx;
> 
>             }
> 
> 
> 
>             {
> 
>                  node= act_identifier(
> (SIMPLE_IDENTIFIER12->getText(SIMPLE_IDENTIFIER12))->chars );
> 
>             }
> 
> 
> 
>         }
> 
> 
> 
>     }
> 
> 
> 
> 
> 
>     // This is where rules clean up and exit
> 
>     //
> 
>     goto ruleidentifierEx; /* Prevent compiler warnings */
> 
>     ruleidentifierEx: ;
> 
> 
> 
>             if (HASEXCEPTION())
> 
>             {
> 
>                 PREPORTERROR();
> 
>                 PRECOVER();
> 
>             }
> 
> 
> 
> 
> 
>     return node;
> 
> }
> 
> 
> 
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle
> Sent: Saturday, May 22, 2010 10:37 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Antlr 3.2 vs. Bison 2.4.2+Flex 2.5.35
> Speed/Memory
> 
> 
> 
> The other thing to add here is that you are using the $xxx.text
> references,
> and these do not free up the string memory until you free the parser.
> With
> this many inputs, the memory usage you are seeing is probably this
> first and
> not the tokens.
> 
> 
> 
> Jim
> 
> 
> 
> > -----Original Message-----
> 
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> 
> > bounces at antlr.org] On Behalf Of Bob
> 
> > Sent: Friday, May 21, 2010 7:47 PM
> 
> > To: antlr-interest at antlr.org
> 
> > Subject: [antlr-interest] Antlr 3.2 vs. Bison 2.4.2+Flex 2.5.35
> 
> > Speed/Memory
> 
> >
> 
> > A tiny grammar was implemented in both Antlr and Bison+Flex (shown
> 
> > below).
> 
> >
> 
> > Test files repeating two lines (shown below) were made in 6 different
> 
> >
> 
> > sizes.
> 
> >
> 
> > One executable compiled with command line switch choosing either
> 
> >
> 
> > Antlr or Bison+Flex.
> 
> >
> 
> > One run with empty actions, one run with actions populated, to
> compare
> 
> >
> 
> > pure parsing with some actual work.
> 
> >
> 
> >
> 
> >
> 
> > Results:
> 
> >
> 
> >
> 
> > CPU time    Peak Memory
> 
> >
> 
> > File Name     File Size # modules #tokens  Bison Antlr  Bison Antlr
> 
> >
> 
> > Action bodies empty:
> 
> >
> 
> > source.v10m     460mb      10m      150m         28s          572k  *
> 
> >
> 
> > source.v5m       230mb       5m        75m           15s
> 572k
> 
> > *
> 
> >
> 
> > source.v2.5m    115mb       2.5m     37m           7s           572k
> *
> 
> >
> 
> > source.v1m       46mb         1m        15m            2s
> 
> > 572k  *
> 
> >
> 
> > source.v500k    23mb        500k      7.5m          1s
> 572k
> 
> > *
> 
> >
> 
> > source.v250k    11mb        250k      3.7m        <1s   4s     572k
> 
> > 1.7g
> 
> > <-----------
> 
> >
> 
> > Action bodies populated:
> 
> >
> 
> > source.v250k    11mb        250k      3.7m         9s   13s    477m
> 
> > 1.7g
> 
> > <-----------
> 
> >
> 
> >
> 
> >
> 
> > * Antlr ran out of memory at 2gb
> 
> >
> 
> >
> 
> >
> 
> > Comments:
> 
> >
> 
> >
> 
> >
> 
> > 1. I expected the requirement that the entire file be resident in
> 
> > memory
> 
> >
> 
> >    to be the memory glut. Surprise! Quick inspection suggests an
> 
> > initial
> 
> >
> 
> >    tokenizing of the entire in-memory file consumes gobbs of memory,
> 
> > pushing
> 
> >
> 
> >    a small footprint up to 1.7gb before releasing it. Only the
> smallest
> 
> >
> 
> >    test file was under the runable 32 bit 2gb limit.     Please fix!!
> 
> >
> 
> >
> 
> >
> 
> > 2. Speed is clearly slower than bison+flex, however empty actions
> don't
> 
> > make
> 
> >
> 
> >    interesting programs. The test with actions enabled shows a 9s vs.
> 
> > 13s
> 
> >
> 
> >    difference, considerable less than the empty action case.
> 
> >
> 
> >
> 
> >
> 
> > 3. If you've never setup bison+flex I have only one comment: !#@%$#.
> 
> > Two
> 
> >
> 
> >    thumbs up for Antlr.
> 
> >
> 
> >
> 
> >
> 
> > Details:
> 
> >
> 
> >
> 
> >
> 
> >   Vista 64, AMD opteron 2.4Ghz, 16gb ram
> 
> >
> 
> >   Visual Studio 2008 Sp1
> 
> >
> 
> >   One exe file with both Antlr and Bison+Flex, targeting 32 bit
> 
> >
> 
> >   Full Optimization (/Ox), Inline Any suitable (/Ob2), Favor Small
> Code
> 
> > (/Os)
> 
> >
> 
> >   Versions:
> 
> >
> 
> >     Antlr 3.2
> 
> >
> 
> >     Bison 2.4.2 LR(1)
> 
> >
> 
> >     Flex  2.5.35
> 
> >
> 
> >
> 
> >
> 
> >
> 
> >
> 
> > ------------------- Input file -----------------------------
> 
> >
> 
> > module tiptop #(int p1=3, p2=4 );
> 
> >
> 
> > endmodule
> 
> >
> 
> > ... repeat to the indicated number of modules ...
> 
> >
> 
> > ------------------- Antlr Grammar --------------------------
> 
> >
> 
> > source_text : description ( description )*
> 
> >
> 
> >     ;
> 
> >
> 
> > description : module_declaration
> 
> >
> 
> >     ;
> 
> >
> 
> > module_declaration : module_ansi_header ENDMODULE ( ':'
> 
> > module_identifier )?
> 
> >
> 
> >         { act_module(); }
> 
> >
> 
> >     ;
> 
> >
> 
> > module_ansi_header : MODULE_KEYWORD module_identifier (
> 
> > parameter_port_list
> 
> > )? ';'
> 
> >
> 
> >     ;
> 
> >
> 
> > module_identifier : identifier
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_port_list
> 
> >
> 
> >     : '#' '(' parameter_port_declaration ( ','
> 
> > parameter_port_declaration )*
> 
> > ')'
> 
> >
> 
> >     | '#' '(' ')'
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_port_declaration returns [void* node]
> 
> >
> 
> > scope {
> 
> >
> 
> >     void* type;
> 
> >
> 
> >     void* head;
> 
> >
> 
> >     void* tail;
> 
> >
> 
> > }
> 
> >
> 
> >     : data_type
> 
> >
> 
> >         { $parameter_port_declaration::type = $data_type.node;
> 
> >
> 
> >             $parameter_port_declaration::head=NULL; }
> 
> > list_of_param_assignments
> 
> >
> 
> >         { $node = $parameter_port_declaration::head; }
> 
> >
> 
> >     ;
> 
> >
> 
> > list_of_param_assignments
> 
> >
> 
> >     : param_assignment ( ',' param_assignment )*
> 
> >
> 
> >     ;
> 
> >
> 
> > param_assignment
> 
> >
> 
> >     : parameter_identifier '=' constant_param_expression
> 
> >
> 
> >         { act_param_assignment
> 
> >
> 
> >             (
> 
> >
> 
> >                 & $parameter_port_declaration::head,
> 
> >
> 
> >                 & $parameter_port_declaration::tail,
> 
> >
> 
> >                 $parameter_identifier.node,
> 
> >
> 
> >                 $parameter_port_declaration::type,
> 
> >
> 
> >                 $constant_param_expression.node
> 
> >
> 
> >             );
> 
> >
> 
> >         }
> 
> >
> 
> >     ;
> 
> >
> 
> > constant_param_expression returns [void* node]
> 
> >
> 
> >     : constant_mintypmax_expression
> 
> >
> 
> >         { $node = $constant_mintypmax_expression.node; }
> 
> >
> 
> > //    | '$'
> 
> >
> 
> >     ;
> 
> >
> 
> > constant_mintypmax_expression returns [void* node]
> 
> >
> 
> >     : constant_expression
> 
> >
> 
> >         { $node = $constant_expression.node; }
> 
> >
> 
> >     ;
> 
> >
> 
> > // Deviate from LRM
> 
> >
> 
> > constant_expression returns [void* node]
> 
> >
> 
> >     : expr { $node = $expr.node; }
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_identifier returns [void* node]
> 
> >
> 
> >     : identifier { $node = $identifier.node; }
> 
> >
> 
> >     ;
> 
> >
> 
> > data_type returns [void* node]
> 
> >
> 
> >     : integer_atom_type signing
> 
> >
> 
> >        {$node=act_type($integer_atom_type.value,$signing.value);}
> 
> >
> 
> >     | integer_atom_type
> 
> >
> 
> >        {$node=act_type($integer_atom_type.value,-1);}
> 
> >
> 
> >     ;
> 
> >
> 
> >
> 
> >
> 
> >
> 
> >
> 
> > expr returns [void* node] : NUMBER
> 
> >
> 
> >         { $node = act_number( $NUMBER.text->chars ); }
> 
> >
> 
> >         ;
> 
> >
> 
> >
> 
> >
> 
> > identifier returns [void* node] : SIMPLE_IDENTIFIER
> 
> >
> 
> >         { $node = act_identifier( $SIMPLE_IDENTIFIER.text->chars ); }
> 
> >
> 
> >     ;
> 
> >
> 
> >
> 
> >
> 
> > /*------------------------------------------------------------------
> 
> >
> 
> >  * LEXER RULES
> 
> >
> 
> >  *------------------------------------------------------------------
> */
> 
> >
> 
> >
> 
> >
> 
> > integer_atom_type returns [int value]
> 
> >
> 
> >     : TokByte       {$value = TokByte;}
> 
> >
> 
> >     | TokShortint   {$value = TokShortint;}
> 
> >
> 
> >     | TokInt        {$value = TokInt;}
> 
> >
> 
> >     | TokLongint    {$value = TokLongint;}
> 
> >
> 
> >     | TokInteger    {$value = TokInteger;}
> 
> >
> 
> >     | TokTime       {$value = TokTime;}
> 
> >
> 
> >     ;
> 
> >
> 
> > signing returns [int value]
> 
> >
> 
> >     : TokSigned     {$value= TokSigned;}
> 
> >
> 
> >     | TokUnsigned   {$value= TokUnsigned;}
> 
> >
> 
> >     ;
> 
> >
> 
> > MODULE_KEYWORD  : (( 'module' )|('macromodule') )
> 
> >
> 
> >     ;
> 
> >
> 
> > ENDMODULE       : 'endmodule'
> 
> >
> 
> >     ;
> 
> >
> 
> > SIMPLE_IDENTIFIER : ( 'a'..'z'|'A'..'Z'|'_' ) (
> 
> > 'a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*
> 
> >
> 
> >     ;
> 
> >
> 
> >
> 
> >
> 
> > NUMBER : (DIGIT)+
> 
> >
> 
> >             ;
> 
> >
> 
> >
> 
> >
> 
> > WHITESPACE  : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+
> 
> >
> 
> >               {
> 
> >
> 
> >                  $channel = HIDDEN;
> 
> >
> 
> >               }
> 
> >
> 
> >             ;
> 
> >
> 
> > fragment
> 
> >
> 
> > DIGIT         : '0'..'9'
> 
> >
> 
> >             ;
> 
> >
> 
> > ------------------- Bison Grammar --------------------------
> 
> >
> 
> > %%
> 
> >
> 
> > source_text : description
> 
> >
> 
> >     ;
> 
> >
> 
> > description
> 
> >
> 
> >                 : module_declaration
> 
> >
> 
> >                 | description module_declaration
> 
> >
> 
> >     ;
> 
> >
> 
> > module_declaration
> 
> >
> 
> >                 : module_ansi_header TokEndmodule
> 
> >
> 
> >                 { act_module(); }
> 
> >
> 
> >                 | module_ansi_header TokEndmodule ':'
> module_identifier
> 
> >
> 
> >                 { act_module(); }
> 
> >
> 
> >     ;
> 
> >
> 
> > module_ansi_header
> 
> >
> 
> >                 : TokModule module_identifier ';'
> 
> >
> 
> >                 | TokModule module_identifier parameter_port_list ';'
> 
> >
> 
> >     ;
> 
> >
> 
> > module_identifier : identifier
> 
> >
> 
> >                 { $$ = $1; }
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_port_list
> 
> >
> 
> >     : '#' '(' parameter_port_list_recur ')'
> 
> >
> 
> >     | '#' '(' ')'
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_port_list_recur
> 
> >
> 
> >                 : parameter_port_declaration
> 
> >
> 
> >                 | parameter_port_list_recur ','
> 
> > parameter_port_declaration
> 
> >
> 
> >                 ;
> 
> >
> 
> > parameter_port_declaration
> 
> >
> 
> >                 : parameter_port_declaration_scope
> 
> >
> 
> >                     data_type { $1.type = $2; $1.head = NULL; }
> 
> >
> 
> >                       list_of_param_assignments { $$ = $1.head; }
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_port_declaration_scope :
> 
> >
> 
> >                 ;
> 
> >
> 
> > list_of_param_assignments
> 
> >
> 
> >     : nil                       nil param_assignment
> 
> >
> 
> > /* FIX:: need LR(2) here */
> 
> >
> 
> >     | list_of_param_assignments ',' param_assignment
> 
> >
> 
> >     ;
> 
> >
> 
> > param_assignment
> 
> >
> 
> >     : parameter_identifier '=' constant_param_expression
> 
> >
> 
> >     { act_param_assignment
> 
> >
> 
> >       (
> 
> >
> 
> >        & $<scope1>-3.head,
> 
> >
> 
> >        & $<scope1>-3.tail,
> 
> >
> 
> >        $1,
> 
> >
> 
> >        $<scope1>-3.type,
> 
> >
> 
> >        $3
> 
> >
> 
> >        );
> 
> >
> 
> >     }
> 
> >
> 
> >     ;
> 
> >
> 
> > constant_param_expression
> 
> >
> 
> >                 : constant_mintypmax_expression { $$ = $1; }
> 
> >
> 
> > //    | '$'
> 
> >
> 
> >     ;
> 
> >
> 
> > constant_mintypmax_expression
> 
> >
> 
> >     : constant_expression { $$ = $1; }
> 
> >
> 
> >     ;
> 
> >
> 
> > // Deviate from LRM
> 
> >
> 
> > constant_expression : expr { $$ = $1; }
> 
> >
> 
> >     ;
> 
> >
> 
> > parameter_identifier : identifier
> 
> >
> 
> >     { $$ = $1; }
> 
> >
> 
> >     ;
> 
> >
> 
> > data_type
> 
> >
> 
> >                 : integer_atom_type signing { $$ = act_typeB($1,$2);
> }
> 
> >
> 
> >                 | integer_atom_type         { $$ = act_typeB($1,-1);
> }
> 
> >
> 
> >                 ;
> 
> >
> 
> > expr       : TokNumber
> 
> >
> 
> >                 { $$ = act_number( $1 ); }
> 
> >
> 
> >                 ;
> 
> >
> 
> >
> 
> >
> 
> > nil           : /* empty */
> 
> >
> 
> >     ;
> 
> >
> 
> > identifier : TokIdentifier
> 
> >
> 
> >                 { $$ = act_identifier( $1 ); }
> 
> >
> 
> >     ;
> 
> >
> 
> > integer_atom_type
> 
> >
> 
> >                 : TokByte     { $$ = $1; }
> 
> >
> 
> >                 | TokShortint { $$ = $1; }
> 
> >
> 
> >                 | TokInt      { $$ = $1; }
> 
> >
> 
> >                 | TokLongint  { $$ = $1; }
> 
> >
> 
> >                 | TokInteger  { $$ = $1; }
> 
> >
> 
> >                 | TokTime     { $$ = $1; }
> 
> >
> 
> >                 ;
> 
> >
> 
> > signing  : TokSigned   { $$ = $1; }
> 
> >
> 
> >                 | TokUnsigned { $$ = $1; }
> 
> >
> 
> >                 ;
> 
> >
> 
> > %%
> 
> >
> 
> > ---------------------------------------------------------------
> 
> >
> 
> >
> 
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> 
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-
> 
> > email-address
> 
> 
> 
> 
> 
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> 
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address





More information about the antlr-interest mailing list