[antlr-interest] OOM error from lexer; ANTLR 2.7.4
Ric Klaren
ric.klaren at gmail.com
Tue Aug 9 05:43:25 PDT 2005
Richard Clark wrote:
> Here's a puzzler: I'm getting Out of Memory errors from the lexer when
> my input files don't end in a newline. The offending rule seems to be:
>
> LINE
> : ':'! (~('{'|'\\'|'\n'|'\r') | ESC)*
> ;
A rule like this is implemented in the generated code as a simple loop
so by itself this rule cannot trigger of out of memory situations (at
least I'd be genuinely suprised...).
It would be helpfull to try and see what happens if you generate the
lexer with tracing on (my guess is that it's recursing somewhere (maybe
some rules that can match nothing?)). Target language and how your
lexer/parser is initialized is also helpfull. Is the lexer by itself
also generating out of memory errors (if you call it from a simple
loop) or is it only happening hooked up to the parser?
> Is there something special I should be doing to catch an EOF, possibly
> splicing in one final newline?
For a lexer eof needs (in general) no extra handling unless you're
dealing with tokenstreams and such (sometimes you have to override the
uponEOF method for eof cleanups) In some cases you can also look (in
action code) explicitly at the current token and detect EOF for better
error handling like in this nested C comment rule:
C_COMMENT options { paraphrase = "'C-comment'"; }
{
unsigned int nstarts = 0; // keep track of nesting level..
}:
'/' { $setType(DIV); }
( '*' { $setType(antlr::Token::SKIP); nstarts++; }
( {
if( LA(1) == EOF || LA(2) == EOF )
throw antlr::RecognitionException("Unclosed comment",
getFilename(),
inputState->tokenStartLine,
inputState->tokenStartColumn );
}:
{ (nstarts > 1) || (LA(2) != '/') }? '*'
{
if(LA(1) == '/')
nstarts--;
}
| '\n' { newline(); }
| '\r'
| '\t' { tab(); }
| "/*" { nstarts++; }
| ~('*'|'\n'|'\r'|'\t')
)*
"*/"
)?
;
Cheers,
Ric
PS just for testing purpose could you try out 2.7.5 or a development
snapshot?
More information about the antlr-interest
mailing list