[antlr-interest] Comment rule matches links
Martin Probst
mail at martin-probst.com
Tue Aug 26 01:34:20 PDT 2008
>> What kind of "further issues" is it causing?
>
> My problem are regular expressions that match quotes, like this one:
>
> replace(/"/, """);
>
> In this case, the STRING rule matches everything from the first to the
> second quote, which is "/, ", and then takes everything beginning
> from the
> last quote sign to any further one.
> I already found the article about island grammars
> (http://www.antlr.org/wiki/display/ANTLR3/Island+Grammars+Under+Parser+Control
> ),
> but I have no idea how I can apply this solution for my problem, for
> the
> workaround is for parser grammars and my STRING / COMMENT rules are
> still
> part of the lexer.
For this regular expression literal problem I think you will
absolutely have to implement something like the island grammars,
otherwise everything else will only get more and more fragile and
hackish.
I've done something similar in my XQuery parser where I exchange the
lexer if I enter some grammatical rule. In your case, you would add a
semantic predicate in your regular expression literal grammar rule:
regExp: '/' { enterRegexp(); } regExpTokens '/' { leaveRegexp(); };
And then in enterRegexp() you need to exchange the lexer for a
specific lexer that handles the regular expressions (or, if you just
want them as an opaque string, one that simply eats everything up to
the next unescaped slash).
Basically you have two languages, your regular outer language, and
within the slashes the regexp language, and the latter is
syntactically totally different. So IMHO the easiest way around that
is to have a separate lexer/token source that kicks in in that area.
This can be as simple as implementing a Lexer on your own in Java and
just returning the full string up to the next non-escaped slash.
HTH,
Martin
More information about the antlr-interest
mailing list