[antlr-interest] OOM error from lexer; ANTLR 2.7.4

Ric Klaren ric.klaren at gmail.com
Tue Aug 9 05:43:25 PDT 2005


Richard Clark wrote:
> Here's a puzzler: I'm getting Out of Memory errors from the lexer  when
> my input files don't end in a newline. The offending rule seems  to be:
> 
> LINE
>     :    ':'! (~('{'|'\\'|'\n'|'\r') | ESC)*
>     ;

A rule like this is implemented in the generated code as a simple loop
so by itself this rule cannot trigger of out of memory situations (at
least I'd be genuinely suprised...).

It would be helpfull to try and see what happens if you generate the
lexer with tracing on (my guess is that it's recursing somewhere (maybe
some rules that can match nothing?)). Target language and how your
lexer/parser is initialized is also helpfull. Is the lexer by itself
also generating out  of memory errors (if you call it from a simple
loop) or is it only happening hooked up to the parser?

> Is there something special I should be doing to catch an EOF,  possibly
> splicing in one final newline?

For a lexer eof needs (in general) no extra handling unless you're
dealing with tokenstreams and such (sometimes you have to override the
uponEOF method for eof cleanups) In some cases you can also look (in
action code) explicitly at the current token and detect EOF for better
error handling like in this nested C comment rule:

C_COMMENT options { paraphrase = "'C-comment'"; }
{
   unsigned int nstarts = 0;  // keep track of nesting level..
}:
   '/' { $setType(DIV); }
   ( '*' { $setType(antlr::Token::SKIP); nstarts++; }
    ( {
	if( LA(1) == EOF || LA(2) == EOF )
	   throw antlr::RecognitionException("Unclosed comment",
                   getFilename(),
                   inputState->tokenStartLine,
                   inputState->tokenStartColumn );
       }:
        { (nstarts > 1) || (LA(2) != '/') }? '*'
	  {
             if(LA(1) == '/')
                nstarts--;
          }
          | '\n' { newline(); }
          | '\r'
          | '\t' { tab(); }
          | "/*" { nstarts++; }
          | ~('*'|'\n'|'\r'|'\t')
    )*
    "*/"
   )?
;

Cheers,

Ric

PS just for testing purpose could you try out 2.7.5 or a development
snapshot?


More information about the antlr-interest mailing list