[antlr-interest] Very high runtime and memory usage caused by trivial grammar => Is it my mistake?

Thu Oct 21 02:11:27 PDT 2010

Hello Bart,

And thank you for your quick answer:

> The rule:
> SOMETOKEN
>  :  ('a')*
>  ;
> matches empty strings, and there are an infinite amount of them before your
> input "<". This is probably the cause of the behavior you're
> witnessing.
Thus I understood that I was stuck in a thinking of looking at my example grammar in a BNF-like way (that way only "a" and the empty string would have been viable inputs to my grammar). Contrary to this perspective, the cause for the mentioned behaviour is the (inevitable) context-unawareness of the lexer, as I defined a lexer token possibly matching empty input.

> Change it into 'a'+ and see if things run differently.
I changed the grammar as advised:

--------------------------------------------------------
grammar problem;

main_rule
        :       SOMETOKEN;

SOMETOKEN
        :       ('a')+;
--------------------------------------------------------

Executing the generated code on the input string "<", the before-mentioned memory usage disappears, but I get a seemingly endless stream of the following output lines on my console:

        "line 1:0 required (...)+ loop did not match anything at character '<'"

Up to this point I thought, I got it. I assume, that the above output line is generated by the lexer. I defined only one lexer token in my grammar, which is wouldn't match an empty string anymore. So why doesn't the lexer stop after unsuccessfully trying to match the string - and runs on instead?

Thanks in advance and best regards,
   Ralf Cremerius