[antlr-interest] greedy vs nongreedy lexer rules

Sun Apr 18 14:02:45 PDT 2010

Hi.I'm in the process of finishing up the grammar analysis so that I can start on code generation for ANTLR v4. ANTLR v3 lexers were pretty annoying and so I'm doing something a little different this time, though it should work in a backward-compatible way...it will simply work more naturally.  At the moment, I'm looking at the nongreedy loop issue. The following consumes all input then fails since + is greedy:

C : '//' .+ '\n' ;

With the DFA runtime interpreter, I can make this simulate a non-greedy loop but it would consume the rest of the input first and then backtrack. yikes!  I've been reading a lot about the  implementation of non-greedy loops and so on. The easiest way to do it is with an NFA not a DFA. Unfortunately, even with a really clever implementation, the  NFA will be slower. More importantly, I'm approximating recursive lexer rules with a DFA and then will invoke the recursive method at runtime after I've distinguished the input from other rules.  What I mean is that, I really kind of need to build a DFA :)

I looked and neither lex or JavaCC have the non-greedy operator.  They want you to use lexer states, which I will also implement. If you don't want to use a lexer state for the rule above, we can recode it as:

C : '//' ~'\n'+ '\n' ;

Can you folks give me examples that are really difficult to implement without the non-greedy operator? I'm trying to find use cases to push me one direction or the other. Assume you will have lexical states.  The /* ... */ comment is an obvious one I guess that you can implement without a non-greedy loop or a semantic predicate or lexical states.  Hmm...seems a shame to destroy my beautiful DFA for this one case that I can solve easily enough, cutting and pasting again for the rest of my life ;) (or importing it with grammar import statement).

any thoughts are welcome.

Terence