[antlr-interest] java.lang.OutOfMemoryError: Java heap space
Wincent Colaiuta
win at wincent.com
Wed Jun 6 15:24:09 PDT 2007
El 6/6/2007, a las 22:01, Robin Davies escribió:
>> fragment DEC_OCTET
>> : DIGIT // 0-9
>> | '1'..'9' DIGIT // 10-99
>> | '1' DIGIT DIGIT // 100-199
>> | '2' '0'..'4' DIGIT // 200-249
>> | '25' '0'..'5' // 250-255
>> ;
> Have to wonder whether this is really a smart thing to do. You're
> using a lexer to enforce a semantic restriction: namely that a
> DEC_OCTET must have a value between of 0 to 255.
>
> From an efficiency point of view, wouldn't it make sense to go for
> DEC_OCTET: DIGIT+; (not a fragment!)
> and then build addresses at the parser level rather than at the
> lexer level, and enforce semantic restrictions either with
> predicates, or (even better, I think) in the processing code.
I think you're probably right. I'm still trying to come to grips with
all these boundaries (lexer/parser, terminal/non-terminal, syntactic/
semantic etc).
> One of the downsides of this kind of semantic enforcement lexically
> is that you end up with crazy error messages like :
>
> Input: 1.1.257.1
> Error: Expecting ".", "0", "1", "2", "3", "4", or "5".
> (not a very helpful error message, in my opinion).
>
> If handle this error in a semantic level then you can provide much
> more semantically relevant error messages like:
> "Malformed IPv4 address".
Yes, I again think you're right. Luckily there's a chapter on error
handling in Ter's book... will have to study up on it! I also need to
figure out how (and if) it can be done when using the C target...
> Not knowing the details of the ANTLR DFA implementation, I have to
> think that the amount of state information that a DFA has to track
> is going to be crazy by the time you get a few octets into an IPv4
> address. It doesn't surprise me that the size of the lexer DFA goes
> ballistic, though.
I think the IPv4 address isn't too crazy, but the IPv6 one definitely
is... I think you're right that the only way to handle it will be to
use much loser restrictions at the syntactic level and then check at
the semantic level.
> I'm struggling with this in some of the sample grammars. I can't
> help thinking (for example) that it's a very bad idea to treat
> "\z" in a C/C++/C# string as a lexical error ("not expecting 'z')
> rather than a semantic error ("illegal escape sequence").
Most definitely...
Cheers,
Wincent
More information about the antlr-interest
mailing list