[antlr-interest] greedy vs nongreedy lexer rules

Sun Apr 18 15:40:53 PDT 2010

On Sun, Apr 18, 2010 at 11:02 PM, Terence Parr <parrt at cs.usfca.edu> wrote:

>
> Can you folks give me examples that are really difficult to implement without the non-greedy operator? I'm trying to find use cases to push me one direction or the other. Assume you will have lexical states.  The /* ... */ comment is an obvious one I guess that you can implement without a non-greedy loop or a semantic predicate or lexical states.  Hmm...seems a shame to destroy my beautiful DFA for this one case that I can solve easily enough, cutting and pasting again for the rest of my life ;) (or importing it with grammar import statement).
>
> any thoughts are welcome.
>
>

Hi,
Well, once I posted here the example of some construct which, in my
opinion, is hard to get right without non-greedy rules. Let me repost:

fragment
VerbatimString
  :
  (
    '[' BlanksOrTabs NewLine BlanksOrTabs
    ( options {greedy=false;}:
      ~(
        '\r'
        | '\n'
       )*
      NewLine BlanksOrTabs
    )*
    ']'
  )
  |
  (
    '{' BlanksOrTabs NewLine BlanksOrTabs
    ( options {greedy=false;}:
      ~(
        '\r'
        | '\n'
       )*
      NewLine BlanksOrTabs
    )*
    '}'
  )
  ;

What;s going on here is that you may have two kinds of strings -
either with '[' ']' as delimiters, or '{' '}' - there are different
semantics that depend on chosen delimiter. Lexer states can be used
for eliminating clumsy alternative, I suppose - if you see '{' on
input enter the 1st mode, otherwise enter the 2nd mode . But the inner
loop here is not solvable with lexer states unless one is willing to
duplicate it in both modes (am I right here?).