[antlr-interest] java.lang.OutOfMemoryError: Java heap space

Wed Jun 6 09:27:10 PDT 2007

El 6/6/2007, a las 17:16, Jim Idle escribió:

> I think you just want to move most of this to the parser and be  
> done with it Wincent. As you are not trying to do anything with the  
> URI, just recognize it, then complicating the lexer so you can have  
> one URI token does not get you anywhere. Instead of using 'URI' in  
> your parser, you just use 'uri'. I don't think it is analysis bugs,  
> I think it is just that you have produced a massively complicated  
> lexer.

I would like to try this, but I'm afraid I don't have the experience  
with ANTLR to be able to pull it off... As I start moving things "up"  
into the parser it becomes harder and harder to write non-ambiguous  
lexer rules...

So what I have done in the time being is simplify the lexer as much  
as possible; you can check out the results here:

<http://pastie.textmate.org/68305>

This generates a lexer which is 1077 lines of Java code. It can no  
longer claim to be a true RFC 3986 recognizer, because in order to  
make this simpler I no longer attempt to recognize IPv6 literals, nor  
what the RFC refers to as "IPvFuture". At some point in the future  
when I am an ANTLR guru I'll hopefully be able to revisit this and  
make a rigorous recognizer of RFC 3986-compliant URIs...

> On the number of lines generated, the C output contains a lot of  
> whitespace, comments (especially in lexer rules) and of course  
> formatting of '{' in C style and so will make you feel you are  
> getting more code lines than you actually are, but you will still  
> need lots of them for this lexer!

Nah, the extra size of the C output compared with the Java doesn't  
really bother me. I understand it's just a question of style and  
conventions. I love the C target! And I know nothing about Java, so  
I'd be lost without the C target...

Cheers,
Wincent