[antlr-interest] Can an Antlr Parser return a TreeNodeStream so as to not have to parse the whole file at once?

Mon Apr 16 12:47:43 PDT 2012

On Mon, Apr 16, 2012 at 3:03 PM, Burton Samograd <burton.samograd at markit.com
> wrote:

> Hello,
> In the following Antlr example, the parser is used to generate an AST
> which is then converted into a CommonTreeNodeStream, which is then passed
> to the checker.
> public static void main(String[] args) {
>
> CalcLexer  lex  = new CalcLexer(
>                        new ANTLRInputStream(System.in));
> CommonTokenStream tokens = new CommonTokenStream(lex);
> CalcParser parser = new CalcParser(tokens);
>
> CalcParser.program_result result = parser.program();
> CommonTree tree = (CommonTree) result.getTree();
>
> CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
> CalcChecker checker = new CalcChecker(nodes);
> checker.program();
>
> CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
> CalcInterpreter interpreter = new CalcInterpreter(nodes);
> interpreter.program();
> }
> Is it possible to get the parser to return a CommonTreeNodeStream that can
> be then passed to the Checker so that the whole file does not have to be
> lexed and parsed at once and rather as a stream of tokens and then tree
> nodes?
>
If I am understanding this correctly, you want to do partial parsing, and
then generating a partial AST because the file is to large. Since the lexer
has to lex/scan the entire text file to create the tokens for the parser,
you cannot do a partial lexing of the input.

Ter did something with scannerless parsing several months ago, but since I
never worked with it I cannot say it will help, but is something I
personally would look into for your problem, but not expect it to work. I
have had stranger suggestions that worked.

I would also profile the running of the grammar to see which part of the
grammar is using too much memory and try altering the grammar and/or adding
actions to correct the problem.

Usually one wants the entire AST before doing analysis, so I am curious as
to what you would do with an the AST tokens being processed as a stream
instead of a DOM.

As a worse case, you could switch to overriding parts of the ANTLR parser
with hand written code, or even worse, switch to a different type of
parser, i.e. LR, parser combinator, fully hand written recursive descent.

You can also contract for support from Ter.

 Eric.

> I ask because we are running into a problem with an extremely large file
> being passed into our Antlr parser and it is causing memory exhaustion in
> the parsing phase. I am thinking that using a TreeNodeStream would solve
> this problem if it is even possible.
> --
> Burton Samograd
>
>
> ________________________________
> This e-mail, including accompanying communications and attachments, is
> strictly confidential and only for the intended recipient. Any retention,
> use or disclosure not expressly authorised by Markit is prohibited. This
> email is subject to all waivers and other terms at the following link:
> http://www.markit.com/en/about/legal/email-disclaimer.page
>
> Please visit http://www.markit.com/en/about/contact/contact-us.page? for
> contact information on our offices worldwide.
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>