[antlr-interest] greedy vs nongreedy lexer rules

Sun Apr 18 16:15:40 PDT 2010

On Mon, Apr 19, 2010 at 1:02 AM, Terence Parr <parrt at cs.usfca.edu> wrote:
> Hi Marcin,
>
> First, can you do this in v3?
>
> fragment
> VerbatimString
>  :
>  (
>   '[' GUTS ']'
>  )
>  |
>  (
>   '{' GUTS '}'
>  )
>  ;
>
> fragment
> GUTS : BlanksOrTabs NewLine BlanksOrTabs
>   ( options {greedy=false;}:
>     ~(
>       '\r'
>       | '\n'
>      )*
>     NewLine BlanksOrTabs
>   )*
> ;
>
> Then, with lexical modes, you'd share the same mode for the inside/guts.

I am not sure whether it works: as far as I can remember I tried that
and, looking at generated code, I realized that GUTS was not using
'follows' information so it didn't really know when to leave the loop.
That was my question in the original post where I brought this issue
up - I wrote then:

My initial grammar:
fragment
VerbatimString
  :
  (
    '[' LineSequence ']'
  )
  |
  (
    '{' LineSequence '}'
  )
  ;

fragment
LineSequence
   :
NewLine
    (
      ~(
        '\r'
        | '\n'
       )+
      NewLine
    )*

The intention was to match strings starting with some separator ('['
or '{') followed by new line, extending to the closing separator
preceded by new line. Of course, it isn't exactly correct, as you
probably spotted. Problem here was that LineSequence would gladly
consume closing separator after the new line. So, it seemed logical to
make loop in LineSequence non-greedy so that it examines what follows
and leaves as soon as closing operator appears in lookahead. I tried:
fragment
LineSequence
   :
NewLine
    ( options {greedy=false;}:
      ~(
        '\r'
        | '\n'
       )+
      NewLine
    )*
;

and it occurred to me that ANTLR does not really inspect follow set,
it seems to inspect only what's left in the rule itself.

Based on this, I believe your proposal wouldn't work.

-- 
Pozdrawiam
Marcin Rzeźnicki