[antlr-interest] v4 questions

Fri Jan 13 06:15:24 PST 2012

Hi Jon,

I believe the static initializer for int[] is incorporated as bytecode,
which results in the static initializer for the class exceeding its maximum
size well before the string representation reaches its maximum.

V4 is currently slower than V3 when run on exactly the same grammar. The
current focus is on keeping the algorithms simple so they can be fully
documented and to help track down and fix errors. Most performance issues
will be addressed later. In the meantime:

1. If you're running a 32-bit version of the JVM, make sure you launch your
application with the "-server" and "-Xmx1024m" flags. V4 currently uses ***a
lot*** of memory and the "client" GC is poorly suited to it. The 64-bit JVM
only offers the server GC and handles large memory scenarios better. Java
1.7 should be faster than 1.6. By "a lot", I mean V4 should use a similar
amount of memory for storing CommonToken[] data, but the static overhead to
hold the DFA/ATN can easily be 10X the size of V3 (I've seen it reach 150+MB
while parsing Java). Small grammars are not nearly as impacted by this
issue.

2. In V4, the DFA for a parser is cached per-parser instance. Rather than
use "MyParser mySecondParser = new MyParser(secondInputStream)", you can use
"myFirstParser.setInputStream(secondInputStream)" to get a substantial
performance boost if you're parsing more than one input. Note that the parse
routine is not thread-safe, so if you are parsing on multiple threads you'll
need to use one parser instance per thread. This technique applies to both
lexers and parsers. Note that 2 parser instances at the same time will use
twice the memory for holding DFAs.

3. In V4 lexers are implemented radically different from V3, and should be
at least as fast as V3 (faster in the majority of cases) as long as you use
the technique from point #2.

4. If you convert your parser to use the new left recursive expression
syntax for V4, you can get a substantial performance boost.

Just so you know, I have an experimental build that I've been working on
locally that I plan to reference when we start looking at performance issues
in the future. For heads-up comparison (not using V4's LR syntax), this
build is faster than V3 and uses slightly less memory than V3. If you
compare a grammar with LR expression syntax, [this build of] V4 outperforms
V3 by about a 3:1 margin in half the memory.

--
Sam Harwell
Owner, Lead Developer
http://tunnelvisionlabs.com

-----Original Message-----
From: JonB [mailto:blinku at gmail.com] 
Sent: Friday, January 13, 2012 3:51 AM
To: antlr-interest
Subject: [antlr-interest] v4 questions

Hello Terence!
I'm looking for v4 for some days and have questions to you.
1. Is it possible(mb better?) to change type of _serializedATN from String
to int[] and do convertion String -> char[] -> int[] in Tool(parser
generator). It'll be better for other language runtimes that hasn't java
toCharArray() method or doesn't support octals in string literals.
Another reason is that you can see "constant string too long" java error on
real big grammar.
 2. Is it normal that v4 parser is slower than v34 parser for now(same
grammar is used)?
Jon B.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address