[antlr-interest] Re : reuse() methos in 3.4 C runtime/exception report

Mon Nov 21 07:13:19 PST 2011

Hi Jim, Ruslan,

I experimented some issue in a similar kind of usage (iteration of multiple inputstream with same lexer & parser)

I reset the tokenstream, lexer and parser in order specified in Jim's answer ...

Unfortunately, for second iteration and up, some tokens in factory do not see their 

input field properly set (until pools is reallocated). 

Reset of the lexer calls the factory reset but do not
clean up the input field of tokens already in the pool.

Later on when generating new tokens, their are picked up first in the pool but setCharIndex is not NULL
and input not modified.

In case of exception (tokenstream) in 2nd or upper files, filename reported is retrieved from token->input->filename
resulting in a confusing reporting.

if not clear, let me know and I 'll try to give you more details.
If anything wrong in my analysis, feedbacks are welcome.

I did not find any simple and clean way with antlr3 API  to reset those input files fields for already-in-the-pool tokens

Thanks.
Jerome
PS: Anyway, in term of runtime, this approach in C is not significantly reducing time but it 

makes huge differences in Java. For my grammar, with a suite of 1800 input files,
it takes around 6 mns for a JAVA 1 file at a time call vs seconds
for a serialisation of input files 

________________________________
 De : Ruslan Zasukhin <ruslan_zasukhin at valentina-db.com>
À : "antlr-interest at antlr.org" <antlr-interest at antlr.org> 
Envoyé le : Jeudi 17 Novembre 2011 22h24
Objet : Re: [antlr-interest] reuse() methos in 3.4 C runtime

Hi Jim,

Below are copy-pastes of my class-wrapper around ANTLR3
Lexer/Parser/TreeParser.
So you can see if I made some stupid mistake...

>> On 6/24/11 7:49 PM, "Jim Idle" <jimi at temporal-wave.com> wrote:
>>
>> Because the documentation is not yet up to date, here is an example of
>> reusing the allocated memory in input streams and token streams:
>> 
>>     for (i=0; i<iterations; i++)
>>    {
>>         // Run the parser.
>>         psr->start(psr);
>> 
>>         // --------------------------------------
>>         // Now reset everything for the next run.
>>         // Order of calls is important.
>> 
>>         // Input stream can now be reused
>>         input->reuse(input, sourceCode, sourceLen, sourceName);
>> 
>>         // Reset the common token stream so that it will reuse its resources
>>         tstream->reset(tstream);
>> 
>>         // Reset the lexer (new function generated by antlr now)
>>         lxr->reset(lxr);
>> 
>>         // Reset the parser (new function generated by antlr now)
>>         psr->reset(psr);
>>     }

/**************************************************************************
void SqlParser_v3::ResuseParserObjects(
    const char*        inTextToParse,
    vuint32            inLength )
{
    // -------------------------------
    // TREE PARSER cannot be reused. Destroy it.
    //
    if( mpTreeParser )
    {
        mpTreeParser->free( mpTreeParser );
        mpTreeParser = NULL;
    }

    if( mpNodes )
    {
        mpNodes->free( mpNodes );
        mpNodes = NULL;
    }    

    // -------------------------------
    // Reuse other objects
    //
    mpInput->reuse(
        mpInput, 
        (pANTLR3_UINT8) inTextToParse,
        (ANTLR3_UINT32) inLength,
        (pANTLR3_UINT8) "VSQL" );

    mpTokenStream->reset( mpTokenStream );
    mpLexer         ->reset( mpLexer );
    mpParser     ->reset( mpParser );

    ResetOwnData( mpParser );
}

And few other related methods ...

/**************************************************************************
void SqlParser_v3::Parse_UTF8(
    I_SqlDatabaseEx*           inDatabase,
    const char*                inCommand,
    const char*                inCommandEnd )
{
    argused1(inDatabase);

//    COMMENT this line to force  REUSE() mode ...
//    DestroyParserObjects();

    if( mpInput ) 
        ResuseParserObjects( inCommand, (inCommandEnd - inCommand) );
    else
        CreateParserObjects( inCommand, (inCommandEnd - inCommand) );

    // -------------------------
    // Parse the input expression
    mAST = mpParser->sql( mpParser );

    // IF PARSER have generate some errors,
    // then we throw them as VSQL exception.
    if( mpParser->pParser->rec->state->errorCount )
    {    
        StToUTF16 cnv( ResultStringBuffer, pErrEnd, GetConverter_UTF8() );
        throw VSQL::xVSQLException( ERR_SQL_PARSER_ERROR, cnv.c_str() );
    }    
}

/**************************************************************************
void SqlParser_v3::CreateParserObjects(
    const char*        inTextToParse,
    vuint32            inLength )
{
    if( inTextToParse == NULL )
        return; // all objects will be still NULLs also.

    // ------------------------------
    // Create INPUT object:

    // NOTE: SQL strings do not have BOM - first few bytes, which define
endian of UTF16.
    //         So for UTF16, we must here self specify BE or LE.
    mpInput = antlr3StringStreamNew(
        (pANTLR3_UINT8) inTextToParse, mEncoding, (ANTLR3_UINT32) inLength,
(pANTLR3_UINT8) "VSQL" );

    mpInput->setUcaseLA( mpInput, ANTLR3_TRUE );

    // ------------------------------
    // Create LEXER v3 object:

    mpLexer = SqlParser_v3LexerNew( mpInput );
    mpTokenStream = antlr3CommonTokenStreamSourceNew( ANTLR3_SIZE_HINT,
TOKENSOURCE( mpLexer ) );

    // ------------------------------
    // Create PARSER v3 object:

    mpParser = SqlParser_v3ParserNew( mpTokenStream );  // is generated by
ANTLR3
    mpParser->mDoAllCommands = mDoAllCommandsInitial;

    ResetOwnData( mpParser );

/**************************************************************************
void SqlParser_v3::DestroyParserObjects( void )
{
    // REVERSE ORDER to construction:

    if( mpTreeParser )
    {
        mpTreeParser->free( mpTreeParser );
        mpTreeParser = NULL;
    }

    if( mpNodes )
    {
        mpNodes->free( mpNodes );
        mpNodes = NULL;
    }

    if( mpParser )
    {
        mpParser->mpStartPositions = NULL;

        mpParser->free( mpParser );
        mpParser = NULL;
    }

    if( mpTokenStream )
    {
        mpTokenStream->free( mpTokenStream );
        mpTokenStream = NULL;
    }    

    if( mpLexer )
    {
        mpLexer->free( mpLexer );
        mpLexer = NULL;
    }

    if( mpInput )
    {
        mpInput->close( mpInput );
        mpInput = NULL;
    }
}

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address