[antlr-interest] Antlr 3.2 vs. Bison 2.4.2+Flex 2.5.35 Speed/Memory

Jim Idle jimi at temporal-wave.com
Sat May 22 10:36:53 PDT 2010


The other thing to add here is that you are using the $xxx.text references, and these do not free up the string memory until you free the parser. With this many inputs, the memory usage you are seeing is probably this first and not the tokens.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Bob
> Sent: Friday, May 21, 2010 7:47 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Antlr 3.2 vs. Bison 2.4.2+Flex 2.5.35
> Speed/Memory
> 
> A tiny grammar was implemented in both Antlr and Bison+Flex (shown
> below).
> 
> Test files repeating two lines (shown below) were made in 6 different
> 
> sizes.
> 
> One executable compiled with command line switch choosing either
> 
> Antlr or Bison+Flex.
> 
> One run with empty actions, one run with actions populated, to compare
> 
> pure parsing with some actual work.
> 
> 
> 
> Results:
> 
> 
> CPU time    Peak Memory
> 
> File Name     File Size # modules #tokens  Bison Antlr  Bison Antlr
> 
> Action bodies empty:
> 
> source.v10m     460mb      10m      150m         28s          572k  *
> 
> source.v5m       230mb       5m        75m           15s          572k
> *
> 
> source.v2.5m    115mb       2.5m     37m           7s           572k  *
> 
> source.v1m       46mb         1m        15m            2s
> 572k  *
> 
> source.v500k    23mb        500k      7.5m          1s            572k
> *
> 
> source.v250k    11mb        250k      3.7m        <1s   4s     572k
> 1.7g
> <-----------
> 
> Action bodies populated:
> 
> source.v250k    11mb        250k      3.7m         9s   13s    477m
> 1.7g
> <-----------
> 
> 
> 
> * Antlr ran out of memory at 2gb
> 
> 
> 
> Comments:
> 
> 
> 
> 1. I expected the requirement that the entire file be resident in
> memory
> 
>    to be the memory glut. Surprise! Quick inspection suggests an
> initial
> 
>    tokenizing of the entire in-memory file consumes gobbs of memory,
> pushing
> 
>    a small footprint up to 1.7gb before releasing it. Only the smallest
> 
>    test file was under the runable 32 bit 2gb limit.     Please fix!!
> 
> 
> 
> 2. Speed is clearly slower than bison+flex, however empty actions don't
> make
> 
>    interesting programs. The test with actions enabled shows a 9s vs.
> 13s
> 
>    difference, considerable less than the empty action case.
> 
> 
> 
> 3. If you've never setup bison+flex I have only one comment: !#@%$#.
> Two
> 
>    thumbs up for Antlr.
> 
> 
> 
> Details:
> 
> 
> 
>   Vista 64, AMD opteron 2.4Ghz, 16gb ram
> 
>   Visual Studio 2008 Sp1
> 
>   One exe file with both Antlr and Bison+Flex, targeting 32 bit
> 
>   Full Optimization (/Ox), Inline Any suitable (/Ob2), Favor Small Code
> (/Os)
> 
>   Versions:
> 
>     Antlr 3.2
> 
>     Bison 2.4.2 LR(1)
> 
>     Flex  2.5.35
> 
> 
> 
> 
> 
> ------------------- Input file -----------------------------
> 
> module tiptop #(int p1=3, p2=4 );
> 
> endmodule
> 
> ... repeat to the indicated number of modules ...
> 
> ------------------- Antlr Grammar --------------------------
> 
> source_text : description ( description )*
> 
>     ;
> 
> description : module_declaration
> 
>     ;
> 
> module_declaration : module_ansi_header ENDMODULE ( ':'
> module_identifier )?
> 
>         { act_module(); }
> 
>     ;
> 
> module_ansi_header : MODULE_KEYWORD module_identifier (
> parameter_port_list
> )? ';'
> 
>     ;
> 
> module_identifier : identifier
> 
>     ;
> 
> parameter_port_list
> 
>     : '#' '(' parameter_port_declaration ( ','
> parameter_port_declaration )*
> ')'
> 
>     | '#' '(' ')'
> 
>     ;
> 
> parameter_port_declaration returns [void* node]
> 
> scope {
> 
>     void* type;
> 
>     void* head;
> 
>     void* tail;
> 
> }
> 
>     : data_type
> 
>         { $parameter_port_declaration::type = $data_type.node;
> 
>             $parameter_port_declaration::head=NULL; }
> list_of_param_assignments
> 
>         { $node = $parameter_port_declaration::head; }
> 
>     ;
> 
> list_of_param_assignments
> 
>     : param_assignment ( ',' param_assignment )*
> 
>     ;
> 
> param_assignment
> 
>     : parameter_identifier '=' constant_param_expression
> 
>         { act_param_assignment
> 
>             (
> 
>                 & $parameter_port_declaration::head,
> 
>                 & $parameter_port_declaration::tail,
> 
>                 $parameter_identifier.node,
> 
>                 $parameter_port_declaration::type,
> 
>                 $constant_param_expression.node
> 
>             );
> 
>         }
> 
>     ;
> 
> constant_param_expression returns [void* node]
> 
>     : constant_mintypmax_expression
> 
>         { $node = $constant_mintypmax_expression.node; }
> 
> //    | '$'
> 
>     ;
> 
> constant_mintypmax_expression returns [void* node]
> 
>     : constant_expression
> 
>         { $node = $constant_expression.node; }
> 
>     ;
> 
> // Deviate from LRM
> 
> constant_expression returns [void* node]
> 
>     : expr { $node = $expr.node; }
> 
>     ;
> 
> parameter_identifier returns [void* node]
> 
>     : identifier { $node = $identifier.node; }
> 
>     ;
> 
> data_type returns [void* node]
> 
>     : integer_atom_type signing
> 
>        {$node=act_type($integer_atom_type.value,$signing.value);}
> 
>     | integer_atom_type
> 
>        {$node=act_type($integer_atom_type.value,-1);}
> 
>     ;
> 
> 
> 
> 
> 
> expr returns [void* node] : NUMBER
> 
>         { $node = act_number( $NUMBER.text->chars ); }
> 
>         ;
> 
> 
> 
> identifier returns [void* node] : SIMPLE_IDENTIFIER
> 
>         { $node = act_identifier( $SIMPLE_IDENTIFIER.text->chars ); }
> 
>     ;
> 
> 
> 
> /*------------------------------------------------------------------
> 
>  * LEXER RULES
> 
>  *------------------------------------------------------------------*/
> 
> 
> 
> integer_atom_type returns [int value]
> 
>     : TokByte       {$value = TokByte;}
> 
>     | TokShortint   {$value = TokShortint;}
> 
>     | TokInt        {$value = TokInt;}
> 
>     | TokLongint    {$value = TokLongint;}
> 
>     | TokInteger    {$value = TokInteger;}
> 
>     | TokTime       {$value = TokTime;}
> 
>     ;
> 
> signing returns [int value]
> 
>     : TokSigned     {$value= TokSigned;}
> 
>     | TokUnsigned   {$value= TokUnsigned;}
> 
>     ;
> 
> MODULE_KEYWORD  : (( 'module' )|('macromodule') )
> 
>     ;
> 
> ENDMODULE       : 'endmodule'
> 
>     ;
> 
> SIMPLE_IDENTIFIER : ( 'a'..'z'|'A'..'Z'|'_' ) (
> 'a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*
> 
>     ;
> 
> 
> 
> NUMBER : (DIGIT)+
> 
>             ;
> 
> 
> 
> WHITESPACE  : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+
> 
>               {
> 
>                  $channel = HIDDEN;
> 
>               }
> 
>             ;
> 
> fragment
> 
> DIGIT         : '0'..'9'
> 
>             ;
> 
> ------------------- Bison Grammar --------------------------
> 
> %%
> 
> source_text : description
> 
>     ;
> 
> description
> 
>                 : module_declaration
> 
>                 | description module_declaration
> 
>     ;
> 
> module_declaration
> 
>                 : module_ansi_header TokEndmodule
> 
>                 { act_module(); }
> 
>                 | module_ansi_header TokEndmodule ':' module_identifier
> 
>                 { act_module(); }
> 
>     ;
> 
> module_ansi_header
> 
>                 : TokModule module_identifier ';'
> 
>                 | TokModule module_identifier parameter_port_list ';'
> 
>     ;
> 
> module_identifier : identifier
> 
>                 { $$ = $1; }
> 
>     ;
> 
> parameter_port_list
> 
>     : '#' '(' parameter_port_list_recur ')'
> 
>     | '#' '(' ')'
> 
>     ;
> 
> parameter_port_list_recur
> 
>                 : parameter_port_declaration
> 
>                 | parameter_port_list_recur ','
> parameter_port_declaration
> 
>                 ;
> 
> parameter_port_declaration
> 
>                 : parameter_port_declaration_scope
> 
>                     data_type { $1.type = $2; $1.head = NULL; }
> 
>                       list_of_param_assignments { $$ = $1.head; }
> 
>     ;
> 
> parameter_port_declaration_scope :
> 
>                 ;
> 
> list_of_param_assignments
> 
>     : nil                       nil param_assignment
> 
> /* FIX:: need LR(2) here */
> 
>     | list_of_param_assignments ',' param_assignment
> 
>     ;
> 
> param_assignment
> 
>     : parameter_identifier '=' constant_param_expression
> 
>     { act_param_assignment
> 
>       (
> 
>        & $<scope1>-3.head,
> 
>        & $<scope1>-3.tail,
> 
>        $1,
> 
>        $<scope1>-3.type,
> 
>        $3
> 
>        );
> 
>     }
> 
>     ;
> 
> constant_param_expression
> 
>                 : constant_mintypmax_expression { $$ = $1; }
> 
> //    | '$'
> 
>     ;
> 
> constant_mintypmax_expression
> 
>     : constant_expression { $$ = $1; }
> 
>     ;
> 
> // Deviate from LRM
> 
> constant_expression : expr { $$ = $1; }
> 
>     ;
> 
> parameter_identifier : identifier
> 
>     { $$ = $1; }
> 
>     ;
> 
> data_type
> 
>                 : integer_atom_type signing { $$ = act_typeB($1,$2); }
> 
>                 | integer_atom_type         { $$ = act_typeB($1,-1); }
> 
>                 ;
> 
> expr       : TokNumber
> 
>                 { $$ = act_number( $1 ); }
> 
>                 ;
> 
> 
> 
> nil           : /* empty */
> 
>     ;
> 
> identifier : TokIdentifier
> 
>                 { $$ = act_identifier( $1 ); }
> 
>     ;
> 
> integer_atom_type
> 
>                 : TokByte     { $$ = $1; }
> 
>                 | TokShortint { $$ = $1; }
> 
>                 | TokInt      { $$ = $1; }
> 
>                 | TokLongint  { $$ = $1; }
> 
>                 | TokInteger  { $$ = $1; }
> 
>                 | TokTime     { $$ = $1; }
> 
>                 ;
> 
> signing  : TokSigned   { $$ = $1; }
> 
>                 | TokUnsigned { $$ = $1; }
> 
>                 ;
> 
> %%
> 
> ---------------------------------------------------------------
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address





More information about the antlr-interest mailing list