[antlr-interest] java.lang.OutOfMemoryError: Java heap space
Wincent Colaiuta
win at wincent.com
Wed Jun 6 09:27:10 PDT 2007
El 6/6/2007, a las 17:16, Jim Idle escribió:
> I think you just want to move most of this to the parser and be
> done with it Wincent. As you are not trying to do anything with the
> URI, just recognize it, then complicating the lexer so you can have
> one URI token does not get you anywhere. Instead of using 'URI' in
> your parser, you just use 'uri'. I don't think it is analysis bugs,
> I think it is just that you have produced a massively complicated
> lexer.
I would like to try this, but I'm afraid I don't have the experience
with ANTLR to be able to pull it off... As I start moving things "up"
into the parser it becomes harder and harder to write non-ambiguous
lexer rules...
So what I have done in the time being is simplify the lexer as much
as possible; you can check out the results here:
<http://pastie.textmate.org/68305>
This generates a lexer which is 1077 lines of Java code. It can no
longer claim to be a true RFC 3986 recognizer, because in order to
make this simpler I no longer attempt to recognize IPv6 literals, nor
what the RFC refers to as "IPvFuture". At some point in the future
when I am an ANTLR guru I'll hopefully be able to revisit this and
make a rigorous recognizer of RFC 3986-compliant URIs...
> On the number of lines generated, the C output contains a lot of
> whitespace, comments (especially in lexer rules) and of course
> formatting of '{' in C style and so will make you feel you are
> getting more code lines than you actually are, but you will still
> need lots of them for this lexer!
Nah, the extra size of the C output compared with the Java doesn't
really bother me. I understand it's just a question of style and
conventions. I love the C target! And I know nothing about Java, so
I'd be lost without the C target...
Cheers,
Wincent
More information about the antlr-interest
mailing list