[antlr-interest] greedy vs nongreedy lexer rules
Terence Parr
parrt at cs.usfca.edu
Sun Apr 18 14:02:45 PDT 2010
Hi.I'm in the process of finishing up the grammar analysis so that I can start on code generation for ANTLR v4. ANTLR v3 lexers were pretty annoying and so I'm doing something a little different this time, though it should work in a backward-compatible way...it will simply work more naturally. At the moment, I'm looking at the nongreedy loop issue. The following consumes all input then fails since + is greedy:
C : '//' .+ '\n' ;
With the DFA runtime interpreter, I can make this simulate a non-greedy loop but it would consume the rest of the input first and then backtrack. yikes! I've been reading a lot about the implementation of non-greedy loops and so on. The easiest way to do it is with an NFA not a DFA. Unfortunately, even with a really clever implementation, the NFA will be slower. More importantly, I'm approximating recursive lexer rules with a DFA and then will invoke the recursive method at runtime after I've distinguished the input from other rules. What I mean is that, I really kind of need to build a DFA :)
I looked and neither lex or JavaCC have the non-greedy operator. They want you to use lexer states, which I will also implement. If you don't want to use a lexer state for the rule above, we can recode it as:
C : '//' ~'\n'+ '\n' ;
Can you folks give me examples that are really difficult to implement without the non-greedy operator? I'm trying to find use cases to push me one direction or the other. Assume you will have lexical states. The /* ... */ comment is an obvious one I guess that you can implement without a non-greedy loop or a semantic predicate or lexical states. Hmm...seems a shame to destroy my beautiful DFA for this one case that I can solve easily enough, cutting and pasting again for the rest of my life ;) (or importing it with grammar import statement).
any thoughts are welcome.
Terence
More information about the antlr-interest
mailing list