[antlr-interest] Re: ANTLR 3.0 status: got nongreedy loops going

thrutchy eric_mahurin at yahoo.com
Sun Aug 1 16:13:44 PDT 2004


I'm basically suggesting syntax similar to what in java.util.regex (
http://java.sun.com/docs/books/tutorial/extra/regex/quant.html ). 
Here's a little table:

new      old
---      ---
(x)??    {i=0} ( options{greedy=false;}: {!i}? x {++i;} )*
(x)*?    ( options{greedy=false;}: x )*
(x)+?    ( options{greedy=false;}: x )+
(x)?+    ( options{warnWhenFollowAbiguous=false};}: x )?
(x)*+    ( options{greedy=true;}: x )*
(x)++    ( options{greedy=true;}: x )+

As you can see, it's a hack now to do a non-greedy/reluctant
conditional.  I faked a ( )? using a ( )*.

You can reuse ?/+ as suffixes since if they follow ?/*/+ they would
make no since otherwise.  Also, it's nice to reuse stuff from an
existing standard.  In this case regex.

If you are thinking about backtracking with any of these, you could
use a "*" suffix to ?/*/+ to mean do backtracking, since we've already
used ?/+ suffixes.  If you did that, you'd have quite a bit of options
just for loop:

rule     result when something can match x or y
----     --------------------------------------
(x)* y   Warn (or error?) of ambiguity
(x)*? y  use y (non-greedy/reluctant w/o backtracking)
(x)*+ y  use x (regex possessive)
(x)** y  prefer x but use y if it fails later (regex default)
(x)*?* y prefer y but use x if it fails later (regex reluctant)

Feel free to ignore my ramblings above about backtracking since there
probably isn't much usefulness for it :)  I just thought I mention it
since we are on the topic of greediness and backtracking is a related
thing.

I definitely don't think you should make turn off the ambiguity
warnings.  Overall I find them useful.

Eric

--- In antlr-interest at yahoogroups.com, Terence Parr <parrt at c...> wrote:
> 
> On Jul 31, 2004, at 7:22 PM, thrutchy wrote:
> 
> > A few things about greediness:
> >
> > - how about supporting regex-like "?" modifier (greedy=false):
> >
> > CMT : "/*" ( . )*? "*/" ;
> 
> you mean instead of the verbose greedy=false?  Well, ? means optional 
> in my meta-language.
> 
> > - in antlr 2.7.4 warnings and docs, it says greediness doesn't make
> > sense with the optional ()? construct, but it does (regex's use ??).
> 
> Greedy is more about finding longest match in normal regexs, but it 
> requires backtracking.  I use it to indicate how to resolve a 
> deterministic lookahead decision.
> 
> > greedy=true would be the same as warnWhenFollowAbiguous=false, and
> > non-greedy would be the same as not matching the optional construct if
> > it matches what's next.
> 
> I'm hoping that there will just generally be less bitching from antlr 
> unless there is something that really needs your attention :)
> 
> > - also, the java regex "+" greedy=true modifier would be nice:
> >
> > "if" LPAREN expr RPAREN stat ( "else" stat )?+
> 
> That means to match greedily (the default) and shut up about it right?
> 
> > I find the majority of times I use options it can be boiled down to
> > greediness, so the regex shorthands would be nice.
> 
> I see...  Well, i'm really tempted to simply turn off warning emanating 
> from subrules vs "what follows".  It will always default to greedy 
> anyway, so might as well just ignore the warning.
> 
> Ter
> 
> >
> > Eric
> >
> >
> > --- In antlr-interest at yahoogroups.com, Terence Parr <parrt at c...>
wrote:
> >> Howdy,
> >>
> >> Spent 3 days thinking this week and one hour coding to get nongreedy
> >> loops going properly.  ANTLR lexers are much easier to specify now.
> >> For example, here is a sample grammar I'm working with:
> >>
> >> lexer grammar L;
> >>
> >> IF : "if" ;
> >> ID : ('a'..'z')+ ;
> >> WS : (' '|'\n')+ ;
> >> CMT : "/*" ( greedy=false : . )* "*/" ;
> >>
> >> It properly deals with IF vs ID and it handles the CMT rule properly.
> >> It stops when reading when it sees "*/".  Here is the test example:
> >>
> >> java TestLexer "bbd if /* * / ** foo */ abc"
> >>
> >> [bbd/65538,0:0]
> >> [ /65539,0:0]
> >> [if/65536,0:0]
> >> [ /65539,0:0]
> >> [/* * / ** foo *//65540,0:0]
> >> [ /65539,0:0]
> >> [abc/65538,0:0]
> >>
> >> TestLexer is just a loop that prints out Token objects.
> >>
> >>          IntegerStream charStream = new ANTLRStringStream(args[0]);
> >>          L lexer = new L(charStream);
> >>          Token t = lexer.nextToken();
> >>          while ( t.getType()!= IntegerStream.EOF ) {
> >>                  System.out.println(t.toString());
> >>                  t = lexer.nextToken();
> >>          }
> >>
> >> I feel confident that soon I'll be able to handle the Java
grammar. :)
> >>
> >> BTW, org.antlr.runtime.* is only 370 lines of code so far. :)
> >>
> >> runtime/ANTLRStringStream.java
> >> runtime/CommonToken.java
> >> runtime/CommonTokenStream.java
> >> runtime/DFA.java
> >> runtime/IntegerStream.java
> >> runtime/Lexer.java
> >> runtime/Parser.java
> >> runtime/Token.java
> >> runtime/TokenSource.java
> >> runtime/TokenStream.java
> >>
> >> L8R,
> >> Ter
> >> --
> >> CS Professor & Grad Director, University of San Francisco
> >> Creator, ANTLR Parser Generator, http://www.antlr.org
> >> Cofounder, http://www.jguru.com
> >> Cofounder, http://www.knowspam.net enjoy email again!
> >> Cofounder, http://www.peerscope.com pure link sharing
> >
> >
> >
> >
> > Yahoo! Groups Links
> >
> >
> >
> >
> >
> >
> --
> CS Professor & Grad Director, University of San Francisco
> Creator, ANTLR Parser Generator, http://www.antlr.org
> Cofounder, http://www.jguru.com
> Cofounder, http://www.knowspam.net enjoy email again!
> Cofounder, http://www.peerscope.com pure link sharing



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list