[antlr-interest] Coding rule checking for Ada 95

Ron Burk ronburk at gmail.com
Fri Apr 16 06:54:53 PDT 2010


> 3 - the engine must be very scalable (in order to process millions of LoC).

Scalable is a word that usually needs a lot of pinning down.
Does it only have to scale to the size of the largest currently
existing Ada program? Would handling 100 million lines
of code would cover most situations?

>      (??? any ideas to store AST efficiently on disk ???)

Now you introduce a different scalability problem, since
the number of orders of magnitude of speed difference
between memory and disk has only gotten worse over
the years. Presumably you have some constraints on
how long the program can take to execute? How many
days or weeks are allowed for processing a 100
million-line program?

A 100 million-line Ada program requires, say, 10GB
of memory as source bytes (assume 100-character lines).
With care (though perhaps not using your favorite
interpreted language) the space for your own data structures
may be able to fit in the space freed up by stripping white
space and comments. So I would guesstimate it would
be possible to handle 100 million lines in 10GB. The time
required to just sequentially read 10GB (recalling Jim
Gray's observation that disks now begin to look like
sequential tapes to modern CPUs) will be unpleasant
enough -- I'm dubious you could ever make it tolerable to thrash
about on 10GB doing tree traversals via random disk I/O
(random being even slower than the initial sequential I/O
required to pass over the source).

Without knowing any other details (which might be critical),
I would be looking at approaches that involve:
  a) running on a 64-bit address machine with, say, 32GB of memory.
  b) taking advantage of compression opportunities here and there
  c) if necessary (might not be), using an implementation language
     that gives you more control over the size/type of data structures.


More information about the antlr-interest mailing list