[antlr-interest] greedy vs nongreedy lexer rules

Sun Apr 18 16:02:39 PDT 2010

Hi Marcin,

First, can you do this in v3?

fragment
VerbatimString
 :
 (
   '[' GUTS ']'
 )
 |
 (
   '{' GUTS '}'
 )
 ;

fragment
GUTS : BlanksOrTabs NewLine BlanksOrTabs
   ( options {greedy=false;}:
     ~(
       '\r'
       | '\n'
      )*
     NewLine BlanksOrTabs
   )*
;

Then, with lexical modes, you'd share the same mode for the inside/guts.

Ter
On Apr 18, 2010, at 3:40 PM, Marcin Rzeźnicki wrote:
> Hi,
> Well, once I posted here the example of some construct which, in my
> opinion, is hard to get right without non-greedy rules. Let me repost:
> 
> fragment
> VerbatimString
>  :
>  (
>    '[' BlanksOrTabs NewLine BlanksOrTabs
>    ( options {greedy=false;}:
>      ~(
>        '\r'
>        | '\n'
>       )*
>      NewLine BlanksOrTabs
>    )*
>    ']'
>  )
>  |
>  (
>    '{' BlanksOrTabs NewLine BlanksOrTabs
>    ( options {greedy=false;}:
>      ~(
>        '\r'
>        | '\n'
>       )*
>      NewLine BlanksOrTabs
>    )*
>    '}'
>  )
>  ;
> 
> What;s going on here is that you may have two kinds of strings -
> either with '[' ']' as delimiters, or '{' '}' - there are different
> semantics that depend on chosen delimiter. Lexer states can be used
> for eliminating clumsy alternative, I suppose - if you see '{' on
> input enter the 1st mode, otherwise enter the 2nd mode . But the inner
> loop here is not solvable with lexer states unless one is willing to
> duplicate it in both modes (am I right here?).