[antlr-interest] Lookbehind and other regex features

Wed Jul 14 05:08:16 PDT 2004

I posted this on jguru, but it seemed inactive, so I'm trying here.

I'm a ANTLR beginner, but a long time expert of perl and its regular
expressions - including the constructs that have embedded code.

Is there any way to accomplish lookbehind in the lexer or parser? This
is the main thing that I don't immediately see how to do in ANTLR that
I could in Perl's regex's.

Here is a complete list of things easily done in Perl's regex's that
would be nice to have in ANTLR:

(...){m,n} : Match at least m times and at most n. Also make sure that
m and n can be at least integer variables of the target language. n=-1
could represent infinity so that {0,-1} means * and {1,-1} means +
({0,1} means ?). m=-1 could be used to force this alternative to fail-
like a semantic predicate. I realize that these features should be
doable now with semantic predicates watching a counter within a *
loop, but a more concise way would be nice - and not as tied to the
target language.

(...){m,n}? (...)*? (...)+? ?? : Non-greedy matching. I know
"options{greedy=false;}" works, but non-greedy seems commonly useful.
Maybe even have another suffix to set greedy=true to override the
default warning - how about "+" - {}+, *+, ++, ?+. I guess there are
several ways you can deal with greediness: warn if it matters (ANTLR's
default), greedy ignoring following patterns (ANTLR greedy=true, perl
(>...)), greedy with backtracking (ANTLR doesn't do it, perl's
default), non-greedy with backtracking (perl ? suffix to {}, *, +, and
?), and non-greedy w/o backtracking (ANTLR greedy=false, perl ? suffix
to {}, *, +, and ? within (>...)). I realize backtracking is difficult
and slow, but it would be nice to have it on occasion. Syntactic and
semantic predicates do help replace backtracking a little. Another
thing - I'm not sure why ANTLR doesn't allow non-greedy with ? (perl's
??), because it does make sense - only match the ? pattern if the
lookahead doesn't match the pattern following ? pattern.

(?<=...) (?<!...): Positive and negative lookbehind assertions. I
don't know of a way to do this in ANTLR.

(?=...) (?!...): Positive and negative lookahead assertions. I know
semantic predicates going to the cover this (syntactic predicates may
also cover the positive lookahead assertions, but I'm not sure). An
easier to read way of doing it like this would be nice.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/